Zhilin Yang

Papers: 20

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

20papers

Authored papers

Attention Residuals

arXiv 2026

2026

Kimi K2.5: Visual Agentic Intelligence

arXiv 2026

2026

Muon is Scalable for LLM Training

arXiv 2025

2025

Kimi-Audio Technical Report

arXiv 2025

2025

MoBA: Mixture of Block Attention for Long-Context LLMs

arXiv 2025

2025

Kimi-VL Technical Report

arXiv 2025

2025

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

arXiv 2025

2025

Kimi k1.5: Scaling Reinforcement Learning with LLMs

arXiv 2025

2025

Kimi Linear: An Expressive, Efficient Attention Architecture

arXiv 2025

2025

OpenCUA: Open Foundations for Computer-Use Agents

arXiv 2025

2025

OJBench: A Competition Level Code Benchmark For Large Language Models

arXiv 2025

2025

CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X

arXiv 2023

2023

FastMoE: A Fast Mixture-of-Expert Training System

arXiv 2021

2021

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

ACL 2022 5

2021

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

arXiv 2021

2021

GPT Understands, Too

arXiv 2021

2021

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

arXiv 2021

2021

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

transformer-xl-attentive-language-models-1

2019

XLNet: Generalized Autoregressive Pretraining for Language Understanding

xlnet-generalized-autoregressive-pretraining-1

2019

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

hotpotqa-a-dataset-for-diverse-explainable-1

2018

Affiliations

No known affiliations.

Frequent co-authors

from 20 papers

Guokun Lai

Weiran He

Xinyu Zhou

Yanru Chen

Yulun Du

Yuxin Wu

Yuzhi Wang

Enzhe Lu

Jianlin Su

Junjie Yan