Róbert Csordás
- Papers
- 10
Cite
Notes
Only stored in your browser.
Authored papers
10Do Language Models Use Their Depth Efficiently?
arXiv 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
arXiv 2025
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
arXiv 2025
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
arXiv 2024
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
arXiv 2024
MoEUT: Mixture-of-Experts Universal Transformers
arXiv 2024
Randomized Positional Encodings Boost Length Generalization of Transformers
arXiv 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
arXiv 2023
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
arXiv 2023
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
arXiv 2022
Affiliations
Frequent co-authors
10from 10 papers