0

Ming Li

Papers
64

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
64papers

Authored papers

64

Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry

arXiv 2026

2026

Channel-wise Vector Quantization

arXiv 2026

2026

ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

arXiv 2026

2026

InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents

arXiv 2026

2026

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

arXiv 2026

2026

LoL: Longer than Longer, Scaling Video Generation to Hour

arXiv 2026

2026

Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

arXiv 2026

2026

What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis

arXiv 2026

2026

PyVision-RL: Forging Open Agentic Vision Models via RL

arXiv 2026

2026

When AI Navigates the Fog of War

arXiv 2026

2026

Sekai: A Video Dataset towards World Exploration

arXiv 2025

2025

Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning

arXiv 2025

2025

SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

ICCV 2025

2025

How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients

arXiv 2025

2025

Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking

arXiv 2025

2025

Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

arXiv 2025

2025

StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians

arXiv 2025

2025

Schoenfeld's Anatomy of Mathematical Reasoning by Language Models

arXiv 2025

2025

V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions

arXiv 2025

2025

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

arXiv 2025

2025

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

arXiv 2025

2025

Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following

arXiv 2025

2025

Step-Audio 2 Technical Report

arXiv 2025

2025

SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training

arXiv 2025

2025

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

arXiv 2025

2025

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

arXiv 2025

2025

PVChat: Personalized Video Chat with One-Shot Learning

ICCV 2025

2025

VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding

arXiv 2025

2025

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

mdk12-bench-a-multi-discipline-benchmark-for

2025

ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges

ICCV 2025

2025

Where do Large Vision-Language Models Look at when Answering Questions?

arXiv 2025

2025

PyVision: Agentic Vision with Dynamic Tooling

arXiv 2025

2025

TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning

arXiv 2025

2025

MusiXQA: Advancing Visual Music Understanding in Multimodal Large Language Models

arXiv 2025

2025

ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs

arXiv 2025

2025

Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory

arXiv 2025

2025

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

arXiv 2025

2025

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

arXiv 2025

2025

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

arXiv 2025

2025

Exploring Federated Pruning for Large Language Models

arXiv 2025

2025

Multi-Reward as Condition for Instruction-based Image Editing

arXiv 2024

2024

Frame Interpolation with Consecutive Brownian Bridge Diffusion

arXiv 2024

2024

Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements

arXiv 2024

2024

Mosaic-IT: Free Compositional Data Augmentation Improves Instruction Tuning

arXiv 2024

2024

ConsistencyDet: A Few-step Denoising Framework for Object Detection Using the Consistency Model

arXiv 2024

2024

BenTo: Benchmark Task Reduction with In-Context Transferability

arXiv 2024

2024

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

arXiv 2024

2024

A Survey on Knowledge Distillation of Large Language Models

arXiv 2024

2024

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

arXiv 2024

2024

Lossless data compression by large models

arXiv 2024

2024

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

arXiv 2024

2024

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

arXiv 2024

2024

A Comprehensive Guide to Explainable AI: From Classical Models to LLMs

arXiv 2024

2024

KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario

arXiv 2024

2024

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

arXiv 2023

2023

Instant3D: Instant Text-to-3D Generation

arXiv 2023

2023

Language-Specific Representation of Emotion-Concept Knowledge Causally Supports Emotion Inference

arXiv 2023

2023

BiSinger: Bilingual Singing Voice Synthesis

arXiv 2023

2023

DP-BREM: Differentially-Private and Byzantine-Robust Federated Learning with Client Momentum

arXiv 2023

2023

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning

arXiv 2023

2023

AlignDet: Aligning Pre-training and Fine-tuning in Object Detection

ICCV 2023 1

2023

Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation

arXiv 2022

2022

When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

arXiv 2022

2022

End-to-End Open-Domain Question Answering with BERTserini

end-to-end-open-domain-question-answering-1

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 64 papers