Shwai He
- Papers
- 15
Cite
Notes
Only stored in your browser.
Authored papers
15ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model
arXiv 2026
Demystifying When Pruning Works via Representation Hierarchies
arXiv 2026
Understanding and Harnessing Sparsity in Unified Multimodal Models
arXiv 2025
Making Large Language Models Efficient Dense Retrievers
arXiv 2025
CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs
arXiv 2025
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers
arXiv 2024
What Matters in Transformers? Not All Attention is Needed
arXiv 2024
Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
arXiv 2024
Loki: Low-rank Keys for Efficient Sparse Attention
arXiv 2024
Reformatted Alignment
arXiv 2024
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
arXiv 2024
Merging Experts into One: Improving Computational Efficiency of Mixture of Experts
arXiv 2023
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
arXiv 2023
Vega-MT: The JD Explore Academy Translation System for WMT22
arXiv 2022
SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters
arXiv 2022
Affiliations
Frequent co-authors
10from 15 papers