0

TaxoBell: Gaussian Box Embeddings for Self-Supervised Taxonomy Expansion

Taxonomies form the backbone of structured knowledge representation across diverse domains, enabling applications such as e-commerce and semantic search. Yet, manual taxonomy expansion is labor-intensive and slow.

Year
2026
Hosting
Full text hostedCC-BY-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2601.09633CC-BY-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Taxonomies form the backbone of structured knowledge representation across diverse domains, enabling applications such as e-commerce and semantic search. Yet, manual taxonomy expansion is labor-intensive and slow. Existing methods rely on point-based vector embeddings, which model symmetric similarity and thus struggle with the asymmetric relationships that are fundamental to taxonomies. Box embeddings offer a promising alternative by enabling containment and disjointness, but they face key issues: (i) unstable gradients at the intersection boundaries, (ii) no notion of semantic uncertainty, and (iii) limited capacity to represent polysemy or ambiguity. We address these shortcomings with TaxoBell, a Gaussian box embedding framework that translates between box geometries and multivariate Gaussian distributions, where means encode semantic location and covariances encode uncertainty. Energy-based optimization yields stable optimization, robust modeling of ambiguous concepts, and interpretable hierarchical reasoning. Extensive experiments on five benchmark datasets demonstrate that TaxoBell significantly outperforms eight state-of-the-art taxonomy expansion baselines by 19% in MRR and around 25% in Recall@k. We further demonstrate the advantages and pitfalls of TaxoBell with error analysis and ablation studies.