Communication in Large Language Model (LLM)-based multi-agent systems is moving beyond discrete tokens to preserve richer context. Recent work such as LatentMAS enables agents to exchange latent messages through full key-value (KV) caches. However, full KV relay incurs high memory and communication cost. We adapt KV-cache eviction methods to this setting and introduce Orthogonal BackFill (OBF) to mitigate information loss from hard eviction. OBF injects a low-rank orthogonal residual from discarded KV states into the retained KV states. We evaluate OBF against full KV relay on nine benchmarks spanning mathematical reasoning, expert and commonsense QA, and coding. With only 9.9%-20.2% of the prompt KV states retained, H-OBF delivers between 97% and 120% of full KV relay's per-benchmark accuracy across the nine benchmarks. This suggests that more information does not necessarily lead to better communication; preserving the most useful information matters more. Our codebase is included in the supplementary material. Our codebase is publicly available on https://github.com/markli404/When-Less-Latent-Leads-to-Better-Relay.
When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration
Communication in Large Language Model (LLM)-based multi-agent systems is moving beyond discrete tokens to preserve richer context. Recent work such as LatentMAS enables agents to exchange latent messages through full key-value (KV) caches.
- Preview

- Year
- 2026
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2604.13349ARXIV-DEFAULT
- TL;DR
- Semantic Scholar