Lu Yin
- Papers
- 15
Cite
Notes
Only stored in your browser.
Authored papers
15Progressive Residual Warmup for Language Model Pretraining
arXiv 2026
The Curse of Depth in Large Language Models
arXiv 2025
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
arXiv 2025
Diffusion Language Models Know the Answer Before Decoding
arXiv 2025
GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling
arXiv 2025
Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning
arXiv 2025
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
arXiv 2025
AgentAlign: Navigating Safety Alignment in the Shift from Informative to Agentic Large Language Models
arXiv 2025
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching
arXiv 2025
TODO: Enhancing LLM Alignment with Ternary Preferences
arXiv 2024
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients
arXiv 2024
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients
arXiv 2024
OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning
arXiv 2024
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
arXiv 2024
Sparse Training via Boosting Pruning Plasticity with Neuroregeneration
sparse-training-via-boosting-pruning-1
Affiliations
Frequent co-authors
10from 15 papers