Weilin Zhao
- Papers
- 11
Cite
Notes
Only stored in your browser.
Authored papers
11The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
arXiv 2026
MiniCPM4: Ultra-Efficient LLMs on End Devices
arXiv 2025
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
arXiv 2025
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs
arXiv 2025
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
arXiv 2025
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding
arXiv 2024
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
arXiv 2024
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
arXiv 2024
Tool Learning with Foundation Models
arXiv 2023
OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models
arXiv 2023
OpenPrompt: An Open-source Framework for Prompt-learning
ACL 2022 5
Affiliations
Frequent co-authors
10from 11 papers