0

Chao Xu

Papers
21

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
21papers

Authored papers

21

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

arXiv 2025

2025

An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

arXiv 2025

2025

Beyond Description: Cognitively Benchmarking Fine-Grained Action for Embodied Agents

arXiv 2025

2025

Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation

arXiv 2025

2025

U-REPA: Aligning Diffusion U-Nets to ViTs

arXiv 2025

2025

PSC: Extending Context Window of Large Language Models via Phase Shift Calibration

arXiv 2025

2025

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

CVPR 2025 1

2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

arXiv 2024

2024

U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

arXiv 2024

2024

MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos

arXiv 2024

2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

arXiv 2024

2024

DiC: Rethinking Conv3x3 Designs in Diffusion Models

CVPR 2025 1

2024

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

arXiv 2024

2024

TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

arXiv 2024

2024

GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks?

arXiv 2023

2023

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

one-2-3-45-any-single-image-to-3d-mesh-in-45

2023

WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models

arXiv 2023

2023

Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model

arXiv 2023

2023

AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time

arXiv 2022

2022

GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts

arXiv 2022

2022

Augmented Shortcuts for Vision Transformers

NeurIPS 2021 12

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 21 papers