Benchmarking Gaslighting Attacks Against Speech Large Language Models

Jinyang Wu1,†, Bin Zhu1,*, Xiandong Zou1, Qiquan Zhang2, Xu Fang3, Pan Zhou1,*
1Singapore Management University, Singapore
2Tongyi Speech Lab, Alibaba, China
3Dalian University of Technology, China
Contact: imcc.sg@gmail.com
*Corresponding authors
ICASSP 2026

We introduce a systematic benchmark for probing whether Speech LLMs can maintain correct audio-grounded beliefs when users apply emotional pressure, sarcasm, implicit doubt, cognitive distraction, or professional authority.

Motivation

Motivation for studying gaslighting attacks on Speech LLMs
Motivation. Speech LLMs are increasingly used in spoken QA, emotion understanding, and multimodal reasoning, but their robustness to manipulative follow-up prompts remains underexplored.

Our Contributions

Our contributions
Contributions. The project introduces the first systematic benchmark of gaslighting attacks on Speech LLMs, a behavior-aware evaluation protocol, and multimodal stress testing with adversarial prompts and acoustic noise.

Framework Overview

Speech LLM gaslighting benchmark framework overview
Framework overview. Stage 1 establishes baseline predictions from clean speech plus task prompts, while Stage 2 applies gaslighting prompts and optional noise to measure revised outputs and behavior shifts.

Benchmark Setup

Benchmark setup
Benchmark setup. We evaluate 5 models on 5 datasets covering 10,740 test samples, and curate a 1,500-sample behavior-aware benchmark subset. All tasks are formatted as multiple-choice speech understanding problems.

Key Results

Key results
Main finding. Across 5 models and 10,740 test samples, gaslighting-style prompts cause an average accuracy drop of 24.3%, with Cognitive and Professional negation often the most disruptive.

Noise Amplifies Gaslighting Vulnerability

Relative performance change under five negation strategies with increasing noise
Controlled acoustic ablation. Larger drops indicate higher vulnerability when gaslighting prompts are combined with increasing noise.

Acoustic uncertainty compounds semantic pressure

Noise does not affect every manipulation type equally. Professional and Implicit prompts become especially damaging under degraded audio, while Sarcasm remains comparatively more stable in the VocalSound setting.

This suggests that Speech LLM vulnerability is shaped by the interaction between prompt framing and acoustic uncertainty, rather than by text-only or audio-only difficulty alone.

Paper

BibTeX

@inproceedings{wu2026benchmarking,
  title={Benchmarking gaslighting attacks against speech large language models},
  author={Wu, Jinyang and Zhu, Bin and Zou, Xiandong and Zhang, Qiquan and Fang, Xu and Zhou, Pan},
  booktitle={ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={19867--19871},
  year={2026},
  organization={IEEE}
}