0

Xiaobin Hu

Papers
25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
25papers

Authored papers

25

Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation

arXiv 2026

2026

PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset

arXiv 2026

2026

The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

arXiv 2026

2026

4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding

arXiv 2026

2026

Anisotropic Modality Align

arXiv 2026

2026

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

arXiv 2026

2026

The Trinity of Consistency as a Defining Principle for General World Models

arXiv 2026

2026

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

arXiv 2026

2026

Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation

arXiv 2025

2025

UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer

ICCV 2025

2025

Guiding a Diffusion Transformer with the Internal Dynamics of Itself

arXiv 2025

2025

Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation

arXiv 2025

2025

Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10times

arXiv 2025

2025

DiP: Taming Diffusion Models in Pixel Space

arXiv 2025

2025

VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models

arXiv 2025

2025

Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow

arXiv 2025

2025

StrandDesigner: Towards Practical Strand Generation with Sketch Guidance

arXiv 2025

2025

SVFR: A Unified Framework for Generalized Video Face Restoration

CVPR 2025 1

2025

Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling

arXiv 2025

2025

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

CVPR 2025 1

2024

CustAny: Customizing Anything from A Single Example

CVPR 2025 1

2024

VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding

CVPR 2025 1

2024

ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models

arXiv 2024

2024

FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

arXiv 2024

2024

Highly Accurate Dichotomous Image Segmentation

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 25 papers