Shuming Ma
- Papers
- 20
Cite
Notes
Only stored in your browser.
Authored papers
20BitNet b1.58 2B4T Technical Report
arXiv 2025
Next Block Prediction: Video Generation via Semi-Auto-Regressive Modeling
arXiv 2025
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
arXiv 2025
You Only Cache Once: Decoder-Decoder Architectures for Language Models
arXiv 2024
Kosmos-2: Grounding Multimodal Large Language Models to the World
arXiv 2023
Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus
arXiv 2023
Are More Layers Beneficial to Graph Transformers?
arXiv 2023
On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation
arXiv 2023
Auto-ICL: In-Context Learning without Human Supervision
arXiv 2023
A Length-Extrapolatable Transformer
arXiv 2022
Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt
arXiv 2022
CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation
arXiv 2022
GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
arXiv 2022
StableMoE: Stable Routing Strategy for Mixture of Experts
ACL 2022 5
HLT-MT: High-resource Language-specific Training for Multilingual Neural Machine Translation
arXiv 2022
GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation
arXiv 2022
UM4: Unified Multilingual Multiple Teacher-Student Model for Zero-Resource Neural Machine Translation
arXiv 2022
DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
arXiv 2021
Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders
EMNLP 2021 11
Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation
arXiv 2021
Affiliations
Frequent co-authors
10from 20 papers