0

How fast can you find a good hypothesis?

In the hypothesis selection problem, we are given sample and query access to finite set of candidate distributions (hypotheses), $\mathcal{H} = \{H_1, \ldots, H_n\}$, and samples from an unknown distribution $P$, both over a domain $\mathcal{X}$.

Preview
Year
2025
Hosting
Full text hostedCC-BY-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2509.03734CC-BY-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

In the hypothesis selection problem, we are given sample and query access to finite set of candidate distributions (hypotheses), H = {H_1, \ldots, H_n}, and samples from an unknown distribution P, both over a domain X. The goal is to output a distribution Q whose distance to P is comparable to that of the nearest hypothesis in H. Specifically, if the minimum distance is OPT, we aim to output Q such that, with probability at least 1-δ, its total variation distance to P is at most C \cdot OPT + \varepsilon. The optimal approximation for proper algorithms (where Q \in H) is C=3 using Θ(\log(n/δ)/\varepsilon^2) samples from P and for improper algorithms (where Q is not necessarily in H) is C=2 using \tildeΘ(\log(n/δ)/\varepsilon^2) samples from P. In the improper setting, the algorithm achieving C=2 [Bousquet, Braverman, Kol, Efremenko, Moran, FOCS 2021] runs in time which grows polynomially with |X| -- it does not run in finite time for real-valued distributions. A promising path towards improved runtime is to consider improper algorithms which output a mixture Q of the hypotheses as such a distribution can be represented in n words of memory. We show (1) a lower bound that no algorithm which outputs a mixture can achieve approximation better than C = 3-2/n unless the number of samples is polynomial in |X|, as well as (2) an algorithm which runs in time poly(n) and achieves the same approximation guarantee. In the proper setting, [Aliakbarpour, Bun, Smith, NeurIPS 2024] provided an algorithm with C=3 running in \tilde{O}(n/(δ^3\varepsilon^3)) time. We improve this time complexity to \tilde{O}(n/(δ\varepsilon^2)), significantly reducing the dependence on the confidence and error parameters.