Cite
Notes
Only stored in your browser.
Attribution
Taming Sparsely Activated Transformer with Stochastic Experts
taming-sparsely-activated-transformer-with-1
EL-Attention: Memory Efficient Lossless Attention for Generation
arXiv 2021
from 2 papers
Hany Hassan
Jian Jiao
Jianfeng Gao
Jiusheng Chen
Nan Duan
Nikhil Bhendawade
Simiao Zuo
Tuo Zhao
Weizhen Qi
Xiaodong Liu