0

Zhen Li

Papers
37

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
37papers

Authored papers

37

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

arXiv 2026

2026

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

arXiv 2026

2026

MiniCPM4: Ultra-Efficient LLMs on End Devices

arXiv 2025

2025

Sekai: A Video Dataset towards World Exploration

arXiv 2025

2025

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

ICCV 2025

2025

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

ICCV 2025

2025

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

CVPR 2025 1

2025

ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation

arXiv 2025

2025

OmniCaptioner: One Captioner to Rule Them All

arXiv 2025

2025

Yume-1.5: A Text-Controlled Interactive World Generation Model

arXiv 2025

2025

Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

arXiv 2025

2025

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

arXiv 2025

2025

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

arXiv 2025

2025

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering

CVPR 2025 1

2025

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models

arXiv 2025

2025

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

mdk12-bench-a-multi-discipline-benchmark-for

2025

CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments

arXiv 2025

2025

MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

arXiv 2025

2025

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

arXiv 2025

2025

Multi-Sourced Compositional Generalization in Visual Question Answering

arXiv 2025

2025

Distribution Matching Distillation Meets Reinforcement Learning

arXiv 2025

2025

DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation

CVPR 2025 1

2025

Emu3: Next-Token Prediction is All You Need

arXiv 2024

2024

Leveraging Large Language Models for NLG Evaluation: Advances and Challenges

arXiv 2024

2024

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

arXiv 2024

2024

Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting

arXiv 2024

2024

LATR: 3D Lane Detection from Monocular Images with Transformer

ICCV 2023 1

2023

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

CVPR 2024 1

2023

SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection

ICCV 2023 1

2023

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

CVPR 2024 1

2023

SRFormerV2: Taking a Closer Look at Permuted Self-Attention for Image Super-Resolution

ICCV 2023 1

2023

AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation

CVPR 2023 1

2023

SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training

ICCV 2023 1

2023

Towards An End-to-End Framework for Flow-Guided Video Inpainting

CVPR 2022 1

2022

Composable Text Controls in Latent Space with ODEs

arXiv 2022

2022

Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

don-t-take-it-literally-an-edit-invariant-1

2021

VulDeePecker: A Deep Learning-Based System for Vulnerability Detection

arXiv 2018

2018

Affiliations

No known affiliations.

Frequent co-authors

10

from 37 papers