0

Li Shen

Papers
54

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
54papers

Authored papers

54

GradientStabilizer:Fix the Norm, Not the Gradient

arXiv 2025

2026

Language-based Trial and Error Falls Behind in the Era of Experience

arXiv 2026

2026

Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation

arXiv 2025

2025

Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG

arXiv 2025

2025

Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities

arXiv 2025

2025

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

arXiv 2025

2025

Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging

arXiv 2025

2025

R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search

arXiv 2025

2025

FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail

arXiv 2025

2025

Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization

arXiv 2025

2025

UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios

arXiv 2025

2025

Diffusion Language Models Know the Answer Before Decoding

arXiv 2025

2025

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

arXiv 2025

2025

Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

arXiv 2025

2025

Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent

arXiv 2025

2025

GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching

arXiv 2025

2025

Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning

arXiv 2025

2025

MentalChat16K: A Benchmark Dataset for Conversational Mental Health Assistance

arXiv 2025

2025

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

arXiv 2024

2024

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

arXiv 2024

2024

Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

arXiv 2024

2024

Communication Learning in Multi-Agent Systems from Graph Modeling Perspective

arXiv 2024

2024

OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models

arXiv 2024

2024

HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

arXiv 2024

2024

ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning

arXiv 2024

2024

A Unified and General Framework for Continual Learning

arXiv 2024

2024

Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace

arXiv 2024

2024

FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models

arXiv 2024

2024

Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

arXiv 2024

2024

Representation Surgery for Multi-Task Model Merging

arXiv 2024

2024

Revisiting Knowledge Distillation for Autoregressive Language Models

arXiv 2024

2024

SEFD: Semantic-Enhanced Framework for Detecting LLM-Generated Text

arXiv 2024

2024

DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

arXiv 2023

2023

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

arXiv 2023

2023

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

arXiv 2023

2023

Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?

arXiv 2023

2023

Learning to Learn from APIs: Black-Box Data-Free Meta-Learning

arXiv 2023

2023

Unmasking Bias in Diffusion Model Training

arXiv 2023

2023

AdaMerging: Adaptive Model Merging for Multi-Task Learning

arXiv 2023

2023

Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

CVPR 2024 1

2023

FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy

arXiv 2023

2023

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts

arXiv 2023

2023

Streaming Radiance Fields for 3D Video Synthesis

arXiv 2022

2022

Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models

arXiv 2022

2022

Robust Weight Perturbation for Adversarial Training

robust-weight-perturbation-for-adversarial

2022

On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation

COLING 2022 10

2022

Rethinking the Role of Pre-Trained Networks in Source-Free Domain Adaptation

ICCV 2023 1

2022

Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and Understanding

arXiv 2022

2022

Curriculum-based Asymmetric Multi-task Reinforcement Learning

arXiv 2022

2022

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

the-unreasonable-effectiveness-of-random

2022

Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning

CVPR 2022 1

2022

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

sparse-training-via-boosting-pruning-1

2021

Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks

gather-excite-exploiting-feature-context-in-1

2018

Squeeze-and-Excitation Networks

squeeze-and-excitation-networks-1

2017

Affiliations

No known affiliations.

Frequent co-authors

10

from 54 papers