Li Shen
- Papers
- 54
Cite
Notes
Only stored in your browser.
Authored papers
54GradientStabilizer:Fix the Norm, Not the Gradient
arXiv 2025
Language-based Trial and Error Falls Behind in the Era of Experience
arXiv 2026
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
arXiv 2025
Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG
arXiv 2025
Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities
arXiv 2025
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
arXiv 2025
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging
arXiv 2025
R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
arXiv 2025
FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail
arXiv 2025
Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
arXiv 2025
UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios
arXiv 2025
Diffusion Language Models Know the Answer Before Decoding
arXiv 2025
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
arXiv 2025
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging
arXiv 2025
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent
arXiv 2025
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching
arXiv 2025
Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning
arXiv 2025
MentalChat16K: A Benchmark Dataset for Conversational Mental Health Assistance
arXiv 2025
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
arXiv 2024
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
arXiv 2024
Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models
arXiv 2024
Communication Learning in Multi-Agent Systems from Graph Modeling Perspective
arXiv 2024
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models
arXiv 2024
HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning
arXiv 2024
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
arXiv 2024
A Unified and General Framework for Continual Learning
arXiv 2024
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
arXiv 2024
FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models
arXiv 2024
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
arXiv 2024
Representation Surgery for Multi-Task Model Merging
arXiv 2024
Revisiting Knowledge Distillation for Autoregressive Language Models
arXiv 2024
SEFD: Semantic-Enhanced Framework for Detecting LLM-Generated Text
arXiv 2024
DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation
arXiv 2023
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion
arXiv 2023
Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages
arXiv 2023
Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?
arXiv 2023
Learning to Learn from APIs: Black-Box Data-Free Meta-Learning
arXiv 2023
Unmasking Bias in Diffusion Model Training
arXiv 2023
AdaMerging: Adaptive Model Merging for Multi-Task Learning
arXiv 2023
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld
CVPR 2024 1
FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy
arXiv 2023
Merging Experts into One: Improving Computational Efficiency of Mixture of Experts
arXiv 2023
Streaming Radiance Fields for 3D Video Synthesis
arXiv 2022
Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models
arXiv 2022
Robust Weight Perturbation for Adversarial Training
robust-weight-perturbation-for-adversarial
On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation
COLING 2022 10
Rethinking the Role of Pre-Trained Networks in Source-Free Domain Adaptation
ICCV 2023 1
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and Understanding
arXiv 2022
Curriculum-based Asymmetric Multi-task Reinforcement Learning
arXiv 2022
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training
the-unreasonable-effectiveness-of-random
Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning
CVPR 2022 1
Sparse Training via Boosting Pruning Plasticity with Neuroregeneration
sparse-training-via-boosting-pruning-1
Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks
gather-excite-exploiting-feature-context-in-1
Squeeze-and-Excitation Networks
squeeze-and-excitation-networks-1
Affiliations
Frequent co-authors
10from 54 papers