Ming-Ming Cheng
- Papers
- 43
Cite
Notes
Only stored in your browser.
Authored papers
43Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory
arXiv 2026
Mixture of Style Experts for Diverse Image Stylization
arXiv 2026
Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions
arXiv 2026
Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation
arXiv 2026
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
arXiv 2026
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought
arXiv 2026
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
CVPR 2025 1
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
ICCV 2025
InterLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration
arXiv 2025
RPCANet++: Deep Interpretable Robust PCA for Sparse Object Segmentation
arXiv 2025
Revisiting End-to-End Learning with Slide-level Supervision in Computational Pathology
arXiv 2025
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
arXiv 2025
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
arXiv 2025
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models
arXiv 2025
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
arXiv 2024
ATPrompt: Textual Prompt Learning with Embedded Attributes
ICCV 2025
Towards RAW Object Detection in Diverse Conditions
CVPR 2025 1
DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction
ICCV 2025
Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation
arXiv 2024
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection
arXiv 2024
Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
arXiv 2024
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
arXiv 2024
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection
arXiv 2023
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
ICCV 2023 1
Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference
arXiv 2023
SRFormerV2: Taking a Closer Look at Permuted Self-Attention for Image Super-Resolution
ICCV 2023 1
Multi-Space Neural Radiance Fields
CVPR 2023 1
Referring Camouflaged Object Detection
arXiv 2023
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
arXiv 2023
MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing
arXiv 2023
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
CVPR 2024 1
Large Selective Kernel Network for Remote Sensing Object Detection
ICCV 2023 1
Make Explicit Calibration Implicit: Calibrate Denoiser Instead of the Noise Model
ICCV 2023 1
AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
CVPR 2023 1
CrossKD: Cross-Head Knowledge Distillation for Object Detection
CVPR 2024 1
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation
CVPR 2024 1
Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections
CVPR 2023 1
Masked Autoencoders are Efficient Class Incremental Learners
ICCV 2023 1
Co-Salient Object Detection with Co-Representation Purification
arXiv 2023
Towards An End-to-End Framework for Flow-Guided Video Inpainting
CVPR 2022 1
Visual Attention Network
arXiv 2022
Deep Hough Transform for Semantic Line Detection
ECCV 2020 8
Image Inpainting with Learnable Bidirectional Attention Maps
image-inpainting-with-learnable-bidirectional-1
Affiliations
Frequent co-authors
10from 43 papers