0

The Ringelmann Effect in Multi-Agent LLM Systems: A Scaling Law for Effective Team Size

Inference-time multi-agent LLM scaling lacks a shared unit: counting nominal agents conflates cost with independent evidence. We derive a two-parameter scaling law $R(N) = N_\text{eff}/N = 1/(1+c(N-1)N^{-β})$ where the regime exponent $β$ classifies any configuration into one of…

Preview
Year
2026
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2606.02646ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Inference-time multi-agent LLM scaling lacks a shared unit: counting nominal agents conflates cost with independent evidence. We derive a two-parameter scaling law R(N) = N_eff/N = 1/(1+c(N-1)N^{-β}) where the regime exponent β classifies any configuration into one of three asymptotic regimes -- hard-ceiling at 1/c (β= 0), sublinear at N^β/c (0 < β< 1), or linear (β\ge 1), and a mean-field theorem predicts that peer count k and rounds τ during agent debate enter the dynamics only through their product kτ. The law applies at two levels: answer diversity and correctness redundancy. Across 44 (model \times task \times condition) cells spanning peer debate, self-correction, random-noise placebo, self-consistency, three open-weight families (Qwen, Llama, Ministral) at scales from 7B to 32B with a frontier API check (Gemini), thinking models, heterogeneous teams, and sparse communication, the functional form fits every condition at R^2 > 0.99; only (c, β) shifts. On free-form math, dense peer influence collapses the answer-level regime from sublinear into hard-ceiling; correctness-level fits remain hard-ceiling throughout. Three findings have practical implications. (i) Thirty dense debating agents produce no more answer diversity than one on MMLU-Hard. (ii) A noise placebo tracks self-correction on free-form math and at 4\times scale, so within homogeneous teams the gain commonly attributed to ``debate'' comes from re-evaluation, not peer content. (iii) A single N \le 5 pilot predicts the N=30 structural ceiling, and within the configurations tested only architectural diversity (heterogeneous teams) lowers c and escapes the hard-ceiling regime, communication-mode interventions do not.