Chen Li
- Papers
- 34
Cite
Notes
Only stored in your browser.
Authored papers
34GLM-5: from Vibe Coding to Agentic Engineering
arXiv 2026
FireRed-Image-Edit-1.0 Techinical Report
arXiv 2026
AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents
arXiv 2026
NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing
arXiv 2026
What Is Wrong with Synthetic Data for Scene Text Recognition? A Strong Synthetic Engine with Diverse Simulations and Self-Evolution
arXiv 2026
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing
arXiv 2026
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
arXiv 2025
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation
arXiv 2025
ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
arXiv 2025
V-Thinker: Interactive Thinking with Images
arXiv 2025
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
arXiv 2025
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
arXiv 2025
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
arXiv 2025
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
arXiv 2025
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
arXiv 2025
UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits
arXiv 2025
Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin
ICCV 2025
Mamba YOLO: A Simple Baseline for Object Detection with State Space Model
arXiv 2024
DISC: Plug-and-Play Decoding Intervention with Similarity of Characters for Chinese Spelling Check
arXiv 2024
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
arXiv 2024
Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection
CVPR 2025 1
ST-LLM: Large Language Models Are Effective Temporal Learners
st-llm-large-language-models-are-effective
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
arXiv 2024
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
arXiv 2024
Making LLaMA SEE and Draw with SEED Tokenizer
arXiv 2023
Unleashing the Potential of Spiking Neural Networks by Dynamic Confidence
arXiv 2023
How to Determine the Most Powerful Pre-trained Language Model without Brute Force Fine-tuning? An Empirical Survey
arXiv 2023
NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts
arXiv 2023
Efficient Diffusion Training via Min-SNR Weighting Strategy
ICCV 2023 1
DETR Doesn't Need Multi-Scale or Locality Design
arXiv 2023
Vision-Language Instruction Tuning: A Review and Analysis
arXiv 2023
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
ICCV 2023 1
NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects
CVPR 2023 1
Weakly-supervised 3D Pose Transfer with Keypoints
ICCV 2023 1
Affiliations
Frequent co-authors
10from 34 papers