Xiaobin Hu
- Papers
- 25
Cite
Notes
Only stored in your browser.
Authored papers
25Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation
arXiv 2026
PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset
arXiv 2026
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook
arXiv 2026
4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding
arXiv 2026
Anisotropic Modality Align
arXiv 2026
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
arXiv 2026
The Trinity of Consistency as a Defining Principle for General World Models
arXiv 2026
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning
arXiv 2026
Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation
arXiv 2025
UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer
ICCV 2025
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
arXiv 2025
Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation
arXiv 2025
Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10times
arXiv 2025
DiP: Taming Diffusion Models in Pixel Space
arXiv 2025
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models
arXiv 2025
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
arXiv 2025
StrandDesigner: Towards Practical Strand Generation with Sketch Guidance
arXiv 2025
SVFR: A Unified Framework for Generalized Video Face Restoration
CVPR 2025 1
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling
arXiv 2025
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
CVPR 2025 1
CustAny: Customizing Anything from A Single Example
CVPR 2025 1
VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding
CVPR 2025 1
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models
arXiv 2024
FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on
arXiv 2024
Highly Accurate Dichotomous Image Segmentation
arXiv 2022
Affiliations
Frequent co-authors
10from 25 papers