Shiwei Liu
- Papers
- 25
Cite
Notes
Only stored in your browser.
Authored papers
25Motion-Aware Caching for Efficient Autoregressive Video Generation
arXiv 2026
GradientStabilizer:Fix the Norm, Not the Gradient
arXiv 2025
When Does Sparsity Mitigate the Curse of Depth in LLMs
arXiv 2026
The Curse of Depth in Large Language Models
arXiv 2025
GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling
arXiv 2025
Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning
arXiv 2025
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
arXiv 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
arXiv 2025
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
arXiv 2025
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
arXiv 2025
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching
arXiv 2025
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
arXiv 2025
Diffusion Language Models Know the Answer Before Decoding
arXiv 2025
OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning
arXiv 2024
Composable Interventions for Language Models
arXiv 2024
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
arXiv 2024
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients
arXiv 2024
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
arXiv 2024
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients
arXiv 2024
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
the-emergence-of-essential-sparsity-in-large
AdaMerging: Adaptive Model Merging for Multi-Task Learning
arXiv 2023
Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers
arXiv 2023
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
arXiv 2023
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training
the-unreasonable-effectiveness-of-random
Sparse Training via Boosting Pruning Plasticity with Neuroregeneration
sparse-training-via-boosting-pruning-1
Affiliations
Frequent co-authors
10from 25 papers