Cite
Notes
Only stored in your browser.
Attribution
Analysing The Impact of Sequence Composition on Language Model Pre-Training
arXiv 2024
Focused Transformer: Contrastive Training for Context Scaling
NeurIPS 2023 11
Hierarchical Transformers Are More Efficient Language Models
hierarchical-transformers-are-more-efficient-1
from 3 papers
Henryk Michalewski
researcher
Konrad Staniszewski
Piotr Miłoś
Yuhuai Wu
Christian Szegedy
Łukasz Kaiser
Michał Tyrolski
Mikołaj Pacek
Pasquale Minervini
Piotr Nawrot