0

When Compression Becomes an Attack Surface: Black-Box Attacks on Prompt-Compressed LLM Agents

Prompt compression is increasingly deployed in LLM agents to reduce latency and cost, but it also determines what the backend LLM ultimately sees. We show that, when trusted and untrusted inputs are compressed under a shared budget, this lossy transformation creates a new attack…

Preview
Year
2025
Hosting
Excerpt onlyCC-BY-NC-SA-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2510.22963CC-BY-NC-SA-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Prompt compression is increasingly deployed in LLM agents to reduce latency and cost, but it also determines what the backend LLM ultimately sees. We show that, when trusted and untrusted inputs are compressed under a shared budget, this lossy transformation creates a new attack surface: by perturbing only untrusted inputs before compression, an adversary can cause the compressor to discard task-critical evidence or safety guardrails before inference. Unlike prompt injection, jailbreaks, or RAG poisoning, the attack target is the compressor rather than the backend LLM; the perturbation need not encode a meaningful instruction or survive compression. We formalize this vulnerability as adversarial information loss (AIL), the excess downstream distortion caused by adversarially steering a lossy compressor beyond benign compression alone. To exploit AIL, we present COMA, a transfer-based black-box attack that optimizes pre-compression perturbations using attacker-side surrogate compressors and backend LLMs. Across three tasks and six compressors, COMA achieves 0.71 average ASR, versus 0.21 for the strongest baseline, and transfers to two real-world agent case studies.