Taiqiang Wu
- Papers
- 14
Cite
Notes
Only stored in your browser.
Authored papers
14Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
arXiv 2026
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection
arXiv 2026
Shadow-FT: Tuning Instruct via Base
arXiv 2025
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
arXiv 2025
Revisiting Model Interpolation for Efficient Reasoning
arXiv 2025
Autoregressive Models in Vision: A Survey
arXiv 2024
LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models
arXiv 2024
Mixture-of-Subspaces in Low-Rank Adaptation
arXiv 2024
Adapting LLaMA Decoder to Vision Transformer
arXiv 2024
A Survey on the Honesty of Large Language Models
arXiv 2024
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast
arXiv 2024
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models
arXiv 2024
Weight-Inherited Distillation for Task-Agnostic BERT Compression
arXiv 2023
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
arXiv 2022
Affiliations
Frequent co-authors
10from 14 papers