Cite
Notes
Only stored in your browser.
Attribution
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
arXiv 2024
The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning
arXiv 2023
Stabilizing Transformer Training by Preventing Attention Entropy Collapse
from 3 papers
Dan Busbridge
Tatiana Likhomanenko
Adam Goliński
Amitis Shidani
Arno Blaas
Borja Rodríguez-Gálvez
Eeshan Dhekane
Etai Littwin
Federico Danieli
Floris Weers