0

The generalized underlap coefficient with an application in clustering

Quantifying distributional separation across groups is fundamental in statistical learning and scientific discovery, yet most classical discrepancy measures are tailored to two-group comparisons.

Preview
Year
2026
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2602.19473ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Quantifying distributional separation across groups is fundamental in statistical learning and scientific discovery, yet most classical discrepancy measures are tailored to two-group comparisons. We generalize the underlap coefficient (UNL), a multi-group separation measure, to multivariate settings. We study its relationship with Bayes risk and mutual information, and further interpret the UNL as a measure of dependence between group labels and variables of interest. We propose an efficient importance sampling estimator of the UNL that can be combined with flexible density estimation methods. A key application is the assessment of partition-covariate dependence in clustering, where the UNL provides an interpretable measure of whether latent group structure can be explained by specific covariates. The methodology is illustrated on two real-world datasets.