We revisit extending the Kolmogorov-Smirnov distance between probability distributions to the multi-dimensional setting, and make new arguments about the proper way to approach this generalization. Our proposed formulation maximizes the difference over orthogonal dominating rectangular ranges (d-sided rectangles in R^d), and is an integral probability metric. We also prove that the distance between a distribution and a sample from the distribution converges to 0 as the sample size grows, and bound this rate. Moreover, we show that one can, up to this same approximation error, compute the distance efficiently in 4 or fewer dimensions; specifically, the runtime is near-linear in the size of the sample needed for that error. With this, we derive a delta-precision two-sample hypothesis test using this distance. Finally, we show these metrics and approximation properties do not hold for other popular variants.
Efficient and Stable Multi-Dimensional Kolmogorov-Smirnov Distance
We revisit extending the Kolmogorov-Smirnov distance between probability distributions to the multi-dimensional setting, and make new arguments about the proper way to approach this generalization.
- Year
- 2025
- Hosting
- Full text hostedCC-BY-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2504.11299CC-BY-4.0
- TL;DR
- Semantic Scholar