Chunhua Shen
- Papers
- 51
Cite
Notes
Only stored in your browser.
Authored papers
51Geo-Align: Video Generation Alignment via Metric Geometry Reward
arXiv 2026
Exploring Spatial Intelligence from a Generative Perspective
arXiv 2026
TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders
arXiv 2026
MARBLE: Multi-Aspect Reward Balance for Diffusion RL
arXiv 2026
Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models
arXiv 2026
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering
arXiv 2026
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
arXiv 2025
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
arXiv 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
arXiv 2025
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
arXiv 2025
Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting
arXiv 2025
$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning
arXiv 2025
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
arXiv 2025
Uniform Discrete Diffusion with Metric Path for Video Generation
arXiv 2025
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
arXiv 2025
POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction
ICCV 2025
Aether: Geometric-Aware Unified World Modeling
ICCV 2025
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
arXiv 2025
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
arXiv 2025
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
arXiv 2024
What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?
arXiv 2024
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
metric3d-v2-a-versatile-monocular-geometric
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
CVPR 2024 1
GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models
arXiv 2024
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data
CVPR 2024 1
Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation
arXiv 2024
FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
arXiv 2024
Depth Any Video with Scalable Synthetic Data
arXiv 2024
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
arXiv 2024
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
arXiv 2023
CTVIS: Consistent Training for Online Video Instance Segmentation
ICCV 2023 1
SegGPT: Segmenting Everything In Context
arXiv 2023
Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching
arXiv 2023
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
arXiv 2023
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
arXiv 2023
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort
arXiv 2023
SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning
ICCV 2023 1
Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction
ICCV 2023 1
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning
arXiv 2023
Generative Prompt Model for Weakly Supervised Object Localization
ICCV 2023 1
DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models
ICCV 2023 1
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
arXiv 2023
Object-aware Inversion and Reassembly for Image Editing
arXiv 2023
StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation
arXiv 2023
Poseur: Direct Human Pose Regression with Transformers
arXiv 2022
FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning
arXiv 2022
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
NeurIPS 2021 12
Conditional Positional Encodings for Vision Transformers
arXiv 2021
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
abcnet-real-time-scene-text-spotting-with-1
End-to-End Video Instance Segmentation with Transformers
CVPR 2021 1
FCOS: Fully Convolutional One-Stage Object Detection
fcos-fully-convolutional-one-stage-object-1
Affiliations
Frequent co-authors
10from 51 papers