0

Han Hu

Papers
48

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
48papers

Authored papers

48

HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents

arXiv 2026

2026

Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models

arXiv 2026

2026

Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

arXiv 2026

2026

Do Phone-Use Agents Respect Your Privacy?

arXiv 2026

2026

YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception

arXiv 2025

2025

Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities

arXiv 2025

2025

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

arXiv 2025

2025

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

arXiv 2025

2025

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

arXiv 2025

2025

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

arXiv 2025

2025

Equivariant Image Modeling

arXiv 2025

2025

Distribution Matching Variational AutoEncoder

arXiv 2025

2025

HunyuanOCR Technical Report

arXiv 2025

2025

Optimal Stepsize for Diffusion Sampling

arXiv 2025

2025

Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

arXiv 2025

2025

Data-efficient Large Vision Models through Sequential Autoregression

arXiv 2024

2024

ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning

arXiv 2024

2024

Xwin-LM: Strong and Scalable Alignment Practice for LLMs

arXiv 2024

2024

TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance

ICCV 2023 1

2023

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

CVPR 2023 1

2023

A Survey on Video Diffusion Models

arXiv 2023

2023

GlyphControl: Glyph Conditional Control for Visual Text Generation

glyphcontrol-glyph-conditional-control-for

2023

Side Adapter Network for Open-Vocabulary Semantic Segmentation

CVPR 2023 1

2023

Efficient Diffusion Training via Min-SNR Weighting Strategy

ICCV 2023 1

2023

Segment and Caption Anything

CVPR 2024 1

2023

DETR Doesn't Need Multi-Scale or Locality Design

arXiv 2023

2023

MotionEditor: Editing Video Motion via Content-Aware Diffusion

CVPR 2024 1

2023

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token

ICCV 2023 1

2023

Mask-Attention-Free Transformer for 3D Instance Segmentation

ICCV 2023 1

2023

Multiple View Geometry Transformers for 3D Human Pose Estimation

CVPR 2024 1

2023

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

ICCV 2023 1

2023

V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection

arXiv 2023

2023

Rank-DETR for High Quality Object Detection

rank-detr-for-high-quality-object-detection

2023

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

arXiv 2023

2023

Tutel: Adaptive Mixture-of-Experts at Scale

arXiv 2022

2022

DETRs with Hybrid Matching

CVPR 2023 1

2022

ResFormer: Scaling ViTs with Multi-Resolution Training

CVPR 2023 1

2022

Attentive Mask CLIP

ICCV 2023 1

2022

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

arXiv 2022

2022

Exploring Discrete Diffusion Models for Image Captioning

arXiv 2022

2022

SimMIM: A Simple Framework for Masked Image Modeling

CVPR 2022 1

2021

End-to-End Semi-Supervised Object Detection with Soft Teacher

ICCV 2021 10

2021

Video Swin Transformer

CVPR 2022 1

2021

Self-Supervised Learning with Swin Transformers

arXiv 2021

2021

Aligning Pretraining for Detection via Object-Level Contrastive Learning

NeurIPS 2021 12

2021

Global Context Networks

arXiv 2020

2020

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning

CVPR 2021 1

2020

RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder

NeurIPS 2020 12

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 48 papers