0

UR-JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures

A central difficulty in training Joint-Embedding Predictive Architectures (JEPAs) is preventing representation collapse. LeJEPA addresses this by enforcing an isotropic Gaussian target on the embeddings via Sketched Isotropic Gaussian Regularization (SIGReg).

Preview
Year
2026
Hosting
Full text hostedCC-BY-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2606.01443CC-BY-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

A central difficulty in training Joint-Embedding Predictive Architectures (JEPAs) is preventing representation collapse. LeJEPA addresses this by enforcing an isotropic Gaussian target on the embeddings via Sketched Isotropic Gaussian Regularization (SIGReg). This target is in tension with the manifold hypothesis, which expects embeddings to concentrate on a low-dimensional subset of the ambient space. We propose UR-JEPA, which targets a uniformly n-rectifiable measure of local tangent dimension n at small scales, realized through a Gaussian-kernel smoothed Carleson-type square function L^{CGLT}, with a complementary Jones β-number formulation. On Inet10, UR-JEPA(L^{CGLT}) attains 0.9141 \pm 0.0014 for a +0.83,pp gain over LeJEPA(L^{SIGReg}) with \sim 30% lower seed standard deviation; on matched-recipe Galaxy10 SDSS, a single-seed ImageNet-100 run, and a 3-seed EuroSAT remote-sensing run, the two methods lie in the same peak-accuracy band at convergence, with UR-JEPA retaining its lower-seed-variance signature. On EuroSAT the in-domain pair is competitive at 96.0 to 96.1% with large remote-sensing foundation-model transfer at a 25\times smaller backbone. The distinction is geometric: direct visualization of the projector output distribution shows that on all four datasets UR--JEPA(L^{CGLT}) produces a global PCA spectrum with a 4 to 5 order-of-magnitude drop at index \sim 20 to 25 out of D = 32, while LeJEPA's spectrum is near-flat (top-to-bottom ratio at most 3.6). Per-dimension marginals are simultaneously near-Gaussian for both methods (mean Shapiro-Wilk W \in [0.992, 0.996]) as a Diaconis-Freedman consequence. At matched accuracy the two regularizers therefore yield structurally distinct projected representations.