Zhilin Yang
- Papers
- 20
Cite
Notes
Only stored in your browser.
Authored papers
20Attention Residuals
arXiv 2026
Kimi K2.5: Visual Agentic Intelligence
arXiv 2026
Muon is Scalable for LLM Training
arXiv 2025
Kimi-Audio Technical Report
arXiv 2025
MoBA: Mixture of Block Attention for Long-Context LLMs
arXiv 2025
Kimi-VL Technical Report
arXiv 2025
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
arXiv 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
arXiv 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
arXiv 2025
OpenCUA: Open Foundations for Computer-Use Agents
arXiv 2025
OJBench: A Competition Level Code Benchmark For Large Language Models
arXiv 2025
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X
arXiv 2023
FastMoE: A Fast Mixture-of-Expert Training System
arXiv 2021
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
ACL 2022 5
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
arXiv 2021
GPT Understands, Too
arXiv 2021
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
arXiv 2021
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
transformer-xl-attentive-language-models-1
XLNet: Generalized Autoregressive Pretraining for Language Understanding
xlnet-generalized-autoregressive-pretraining-1
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
hotpotqa-a-dataset-for-diverse-explainable-1
Affiliations
Frequent co-authors
10from 20 papers