Ming-Hsuan Yang
- Papers
- 65
Cite
Notes
Only stored in your browser.
Authored papers
65LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs
arXiv 2026
MotiMotion: Motion-Controlled Video Generation with Visual Reasoning
arXiv 2026
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory
arXiv 2026
Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models
arXiv 2026
Context Forcing: Consistent Autoregressive Video Generation with Long Context
arXiv 2026
LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs
arXiv 2026
LLM Post-Training: A Deep Dive into Reasoning Large Language Models
arXiv 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
arXiv 2025
Generative AI for Autonomous Driving: Frontiers and Opportunities
arXiv 2025
Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation
arXiv 2025
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency
arXiv 2025
IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
arXiv 2025
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models
arXiv 2025
AirSim360: A Panoramic Simulation Platform within Drone View
arXiv 2025
4KAgent: Agentic Any Image to 4K Super-Resolution
arXiv 2025
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
arXiv 2025
From Masks to Worlds: A Hitchhiker's Guide to World Models
arXiv 2025
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
arXiv 2025
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
arXiv 2025
Image Diffusion Preview with Consistency Solver
arXiv 2025
Controllable 3D Outdoor Scene Generation via Scene Graphs
ICCV 2025
Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model
arXiv 2025
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
arXiv 2024
Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model
arXiv 2024
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark
arXiv 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
arXiv 2024
Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models
arXiv 2024
Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
arXiv 2024
StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
arXiv 2024
RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything
arXiv 2024
Video Prediction Transformers without Recurrence or Convolution
arXiv 2024
SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow
arXiv 2024
Ranking-aware adapter for text-driven image ordering with CLIP
arXiv 2024
VideoGLUE: Video General Understanding Evaluation of Foundation Models
arXiv 2023
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
ICCV 2023 1
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024 1
VidToMe: Video Token Merging for Zero-Shot Video Editing
CVPR 2024 1
Burstormer: Burst Image Restoration and Enhancement Transformer
CVPR 2023 1
CiteTracker: Correlating Image and Text for Visual Tracking
ICCV 2023 1
Foundational Models Defining a New Era in Vision: A Survey and Outlook
arXiv 2023
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
CVPR 2024 1
PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection
CVPR 2024 1
Delving into Motion-Aware Matching for Monocular 3D Object Tracking
ICCV 2023 1
Generative Multiplane Neural Radiance for 3D-Aware Image Generation
ICCV 2023 1
A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence
NeurIPS 2023 11
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
ICCV 2023 1
Text-Driven Image Editing via Learnable Regions
CVPR 2024 1
Pyramid Diffusion for Fine 3D Large Scene Generation
arXiv 2023
Exploiting Diffusion Prior for Generalizable Dense Prediction
arXiv 2023
Dual Associated Encoder for Face Restoration
arXiv 2023
CLR: Channel-wise Lightweight Reprogramming for Continual Learning
ICCV 2023 1
Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance
arXiv 2023
MAGVIT: Masked Generative Video Transformer
CVPR 2023 1
High-Quality Entity Segmentation
arXiv 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications
arXiv 2022
An Extendable, Efficient and Effective Transformer-based Object Detector
arXiv 2022
GAN Inversion: A Survey
arXiv 2021
Restormer: Efficient Transformer for High-Resolution Image Restoration
CVPR 2022 1
Hierarchical Modular Network for Video Captioning
CVPR 2022 1
MC-Blur: A Comprehensive Benchmark for Image Deblurring
arXiv 2021
Spatiotemporal Contrastive Video Representation Learning
CVPR 2021 1
Learning Enriched Features for Real Image Restoration and Enhancement
ECCV 2020 8
Joint-task Self-supervised Learning for Temporal Correspondence
joint-task-self-supervised-learning-for-1
A Closed-form Solution to Photorealistic Image Stylization
a-closed-form-solution-to-photorealistic-1
Unsupervised Representation Learning by Sorting Sequences
unsupervised-representation-learning-by-3
Affiliations
Frequent co-authors
10from 65 papers