0

Ming-Hsuan Yang

Papers
65

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
65papers

Authored papers

65

LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs

arXiv 2026

2026

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

arXiv 2026

2026

LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

arXiv 2026

2026

Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models

arXiv 2026

2026

Context Forcing: Consistent Autoregressive Video Generation with Long Context

arXiv 2026

2026

LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs

arXiv 2026

2026

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

arXiv 2025

2025

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

arXiv 2025

2025

Generative AI for Autonomous Driving: Frontiers and Opportunities

arXiv 2025

2025

Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation

arXiv 2025

2025

DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency

arXiv 2025

2025

IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation

arXiv 2025

2025

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

arXiv 2025

2025

AirSim360: A Panoramic Simulation Platform within Drone View

arXiv 2025

2025

4KAgent: Agentic Any Image to 4K Super-Resolution

arXiv 2025

2025

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

arXiv 2025

2025

From Masks to Worlds: A Hitchhiker's Guide to World Models

arXiv 2025

2025

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

arXiv 2025

2025

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

arXiv 2025

2025

Image Diffusion Preview with Consistency Solver

arXiv 2025

2025

Controllable 3D Outdoor Scene Generation via Scene Graphs

ICCV 2025

2025

Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model

arXiv 2025

2025

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion

arXiv 2024

2024

Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

arXiv 2024

2024

Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark

arXiv 2024

2024

VideoPrism: A Foundational Visual Encoder for Video Understanding

arXiv 2024

2024

Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models

arXiv 2024

2024

Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts

arXiv 2024

2024

StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing

arXiv 2024

2024

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything

arXiv 2024

2024

Video Prediction Transformers without Recurrence or Convolution

arXiv 2024

2024

SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow

arXiv 2024

2024

Ranking-aware adapter for text-driven image ordering with CLIP

arXiv 2024

2024

VideoGLUE: Video General Understanding Evaluation of Foundation Models

arXiv 2023

2023

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

ICCV 2023 1

2023

GLaMM: Pixel Grounding Large Multimodal Model

CVPR 2024 1

2023

VidToMe: Video Token Merging for Zero-Shot Video Editing

CVPR 2024 1

2023

Burstormer: Burst Image Restoration and Enhancement Transformer

CVPR 2023 1

2023

CiteTracker: Correlating Image and Text for Visual Tracking

ICCV 2023 1

2023

Foundational Models Defining a New Era in Vision: A Survey and Outlook

arXiv 2023

2023

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

CVPR 2024 1

2023

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

CVPR 2024 1

2023

Delving into Motion-Aware Matching for Monocular 3D Object Tracking

ICCV 2023 1

2023

Generative Multiplane Neural Radiance for 3D-Aware Image Generation

ICCV 2023 1

2023

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

NeurIPS 2023 11

2023

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

ICCV 2023 1

2023

Text-Driven Image Editing via Learnable Regions

CVPR 2024 1

2023

Pyramid Diffusion for Fine 3D Large Scene Generation

arXiv 2023

2023

Exploiting Diffusion Prior for Generalizable Dense Prediction

arXiv 2023

2023

Dual Associated Encoder for Face Restoration

arXiv 2023

2023

CLR: Channel-wise Lightweight Reprogramming for Continual Learning

ICCV 2023 1

2023

Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance

arXiv 2023

2023

MAGVIT: Masked Generative Video Transformer

CVPR 2023 1

2022

High-Quality Entity Segmentation

arXiv 2022

2022

Diffusion Models: A Comprehensive Survey of Methods and Applications

arXiv 2022

2022

An Extendable, Efficient and Effective Transformer-based Object Detector

arXiv 2022

2022

GAN Inversion: A Survey

arXiv 2021

2021

Restormer: Efficient Transformer for High-Resolution Image Restoration

CVPR 2022 1

2021

Hierarchical Modular Network for Video Captioning

CVPR 2022 1

2021

MC-Blur: A Comprehensive Benchmark for Image Deblurring

arXiv 2021

2021

Spatiotemporal Contrastive Video Representation Learning

CVPR 2021 1

2020

Learning Enriched Features for Real Image Restoration and Enhancement

ECCV 2020 8

2020

Joint-task Self-supervised Learning for Temporal Correspondence

joint-task-self-supervised-learning-for-1

2019

A Closed-form Solution to Photorealistic Image Stylization

a-closed-form-solution-to-photorealistic-1

2018

Unsupervised Representation Learning by Sorting Sequences

unsupervised-representation-learning-by-3

2017

Affiliations

No known affiliations.

Frequent co-authors

10

from 65 papers