0

Shiwei Liu

Papers
25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
25papers

Authored papers

25

Motion-Aware Caching for Efficient Autoregressive Video Generation

arXiv 2026

2026

GradientStabilizer:Fix the Norm, Not the Gradient

arXiv 2025

2026

When Does Sparsity Mitigate the Curse of Depth in LLMs

arXiv 2026

2026

The Curse of Depth in Large Language Models

arXiv 2025

2025

GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling

arXiv 2025

2025

Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning

arXiv 2025

2025

LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

arXiv 2025

2025

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

arXiv 2025

2025

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

arXiv 2025

2025

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

arXiv 2025

2025

GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching

arXiv 2025

2025

Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models

arXiv 2025

2025

Diffusion Language Models Know the Answer Before Decoding

arXiv 2025

2025

OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

arXiv 2024

2024

Composable Interventions for Language Models

arXiv 2024

2024

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

arXiv 2024

2024

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

arXiv 2024

2024

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

arXiv 2024

2024

From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

arXiv 2024

2024

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

the-emergence-of-essential-sparsity-in-large

2023

AdaMerging: Adaptive Model Merging for Multi-Task Learning

arXiv 2023

2023

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

arXiv 2023

2023

Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

arXiv 2023

2023

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

the-unreasonable-effectiveness-of-random

2022

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

sparse-training-via-boosting-pruning-1

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 25 papers