Ming Li
- Papers
- 64
Cite
Notes
Only stored in your browser.
Authored papers
64Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry
arXiv 2026
Channel-wise Vector Quantization
arXiv 2026
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents
arXiv 2026
InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents
arXiv 2026
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
arXiv 2026
LoL: Longer than Longer, Scaling Video Generation to Hour
arXiv 2026
Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook
arXiv 2026
What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis
arXiv 2026
PyVision-RL: Forging Open Agentic Vision Models via RL
arXiv 2026
When AI Navigates the Fog of War
arXiv 2026
Sekai: A Video Dataset towards World Exploration
arXiv 2025
Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
arXiv 2025
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
ICCV 2025
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients
arXiv 2025
Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking
arXiv 2025
Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models
arXiv 2025
StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians
arXiv 2025
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
arXiv 2025
V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions
arXiv 2025
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
arXiv 2025
Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction
arXiv 2025
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following
arXiv 2025
Step-Audio 2 Technical Report
arXiv 2025
SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training
arXiv 2025
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
arXiv 2025
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
arXiv 2025
PVChat: Personalized Video Chat with One-Shot Learning
ICCV 2025
VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding
arXiv 2025
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
mdk12-bench-a-multi-discipline-benchmark-for
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
ICCV 2025
Where do Large Vision-Language Models Look at when Answering Questions?
arXiv 2025
PyVision: Agentic Vision with Dynamic Tooling
arXiv 2025
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
arXiv 2025
MusiXQA: Advancing Visual Music Understanding in Multimodal Large Language Models
arXiv 2025
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
arXiv 2025
Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory
arXiv 2025
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
arXiv 2025
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
arXiv 2025
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
arXiv 2025
Exploring Federated Pruning for Large Language Models
arXiv 2025
Multi-Reward as Condition for Instruction-based Image Editing
arXiv 2024
Frame Interpolation with Consecutive Brownian Bridge Diffusion
arXiv 2024
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements
arXiv 2024
Mosaic-IT: Free Compositional Data Augmentation Improves Instruction Tuning
arXiv 2024
ConsistencyDet: A Few-step Denoising Framework for Object Detection Using the Consistency Model
arXiv 2024
BenTo: Benchmark Task Reduction with In-Context Transferability
arXiv 2024
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
arXiv 2024
A Survey on Knowledge Distillation of Large Language Models
arXiv 2024
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
arXiv 2024
Lossless data compression by large models
arXiv 2024
Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
arXiv 2024
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
arXiv 2024
A Comprehensive Guide to Explainable AI: From Classical Models to LLMs
arXiv 2024
KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario
arXiv 2024
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
arXiv 2023
Instant3D: Instant Text-to-3D Generation
arXiv 2023
Language-Specific Representation of Emotion-Concept Knowledge Causally Supports Emotion Inference
arXiv 2023
BiSinger: Bilingual Singing Voice Synthesis
arXiv 2023
DP-BREM: Differentially-Private and Byzantine-Robust Federated Learning with Client Momentum
arXiv 2023
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
arXiv 2023
AlignDet: Aligning Pre-training and Fine-tuning in Object Detection
ICCV 2023 1
Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation
arXiv 2022
When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning
arXiv 2022
End-to-End Open-Domain Question Answering with BERTserini
end-to-end-open-domain-question-answering-1
Affiliations
Frequent co-authors
10from 64 papers