0

Chunhua Shen

Papers
51

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
51papers

Authored papers

51

Geo-Align: Video Generation Alignment via Metric Geometry Reward

arXiv 2026

2026

Exploring Spatial Intelligence from a Generative Perspective

arXiv 2026

2026

TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders

arXiv 2026

2026

MARBLE: Multi-Aspect Reward Balance for Diffusion RL

arXiv 2026

2026

Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models

arXiv 2026

2026

OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering

arXiv 2026

2026

Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

arXiv 2025

2025

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

arXiv 2025

2025

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

arXiv 2025

2025

Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

arXiv 2025

2025

Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting

arXiv 2025

2025

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning

arXiv 2025

2025

Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models

arXiv 2025

2025

Uniform Discrete Diffusion with Metric Path for Video Generation

arXiv 2025

2025

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

arXiv 2025

2025

POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction

ICCV 2025

2025

Aether: Geometric-Aware Unified World Modeling

ICCV 2025

2025

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

arXiv 2025

2025

Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization

arXiv 2025

2025

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

arXiv 2024

2024

What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

arXiv 2024

2024

Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

metric3d-v2-a-versatile-monocular-geometric

2024

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

CVPR 2024 1

2024

GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models

arXiv 2024

2024

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

CVPR 2024 1

2024

Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation

arXiv 2024

2024

FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior

arXiv 2024

2024

Depth Any Video with Scalable Synthetic Data

arXiv 2024

2024

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

arXiv 2024

2024

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

arXiv 2023

2023

CTVIS: Consistent Training for Online Video Instance Segmentation

ICCV 2023 1

2023

SegGPT: Segmenting Everything In Context

arXiv 2023

2023

Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

arXiv 2023

2023

StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data

arXiv 2023

2023

SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

arXiv 2023

2023

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

arXiv 2023

2023

SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning

ICCV 2023 1

2023

Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction

ICCV 2023 1

2023

LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

arXiv 2023

2023

Generative Prompt Model for Weakly Supervised Object Localization

ICCV 2023 1

2023

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

ICCV 2023 1

2023

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

arXiv 2023

2023

Object-aware Inversion and Reassembly for Image Editing

arXiv 2023

2023

StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation

arXiv 2023

2023

Poseur: Direct Human Pose Regression with Transformers

arXiv 2022

2022

FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning

arXiv 2022

2022

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

NeurIPS 2021 12

2021

Conditional Positional Encodings for Vision Transformers

arXiv 2021

2021

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

abcnet-real-time-scene-text-spotting-with-1

2020

End-to-End Video Instance Segmentation with Transformers

CVPR 2021 1

2020

FCOS: Fully Convolutional One-Stage Object Detection

fcos-fully-convolutional-one-stage-object-1

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 51 papers