Adalina: Adaptive Linear Approximation for the Shapley Value and Beyond

The Shapley value, and its broader family of semi-values, has received much attention in various attribution problems. A fundamental and long-standing challenge is their efficient approximation, since exact computation generally requires an exponential number of utility queries in the number of players n. To meet the challenges of large-scale applications, we explore the limits of efficiently approximating semi-values under a Θ(n) space constraint. Building upon a vector concentration inequality, we establish a theoretical framework that enables sharper query complexities for existing unbiased randomized algorithms. Within this framework, we systematically develop a linear-space algorithm that requires O(\frac{n}{ε^{2}}\log\frac{1}δ) utility queries to ensure P(|\hat{\boldsymbolϕ}-\boldsymbolϕ|{2}\geqε)\leq δ for all commonly used semi-values. In particular, our framework naturally bridges OFA, unbiased kernelSHAP, SHAP-IQ and the regression-adjusted approach, and definitively characterizes when paired sampling is beneficial. Moreover, our algorithm allows explicit minimization of the mean squared error \mathbb{E}[|\hat{\boldsymbolϕ}-\boldsymbolϕ|{2}^{2}] for each specific utility function. Accordingly, we introduce the first adaptive, linear-time, linear-space randomized algorithm, Adalina, that theoretically achieves improved mean squared error. All of our theoretical findings are experimentally validated. Our code is available at https://github.com/watml/adalina.