Hui Li

ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions

arXiv 2026

FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions

arXiv 2026

SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context

arXiv 2026

FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration

arXiv 2026

ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation

arXiv 2026

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

arXiv 2025

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

preprint

DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing

ICCV 2025

MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation

arXiv 2025

PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis

arXiv 2025

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

arXiv 2025

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

ICCV 2025

Kronecker Mask and Interpretive Prompts are Language-Action Video Learners

arXiv 2025

DeepSeek-V3 Technical Report

arXiv 2024

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer

CVPR 2025 1

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

arXiv 2024

Agents in Software Engineering: Survey, Landscape, and Vision

arXiv 2024

In-Context Imitation Learning via Next-Token Prediction

arXiv 2024

Nemotron-4 340B Technical Report

arXiv 2024

decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points

arXiv 2024

Generalized Face Anti-spoofing via Finer Domain Partition and Disentangling Liveness-irrelevant Factors

arXiv 2024

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

arXiv 2024

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

arXiv 2024

DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors

arXiv 2024

Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System

arXiv 2024