Han Hu
- Papers
- 48
Cite
Notes
Only stored in your browser.
Authored papers
48HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents
arXiv 2026
Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models
arXiv 2026
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training
arXiv 2026
Do Phone-Use Agents Respect Your Privacy?
arXiv 2026
YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception
arXiv 2025
Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities
arXiv 2025
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
arXiv 2025
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
arXiv 2025
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
arXiv 2025
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
arXiv 2025
Equivariant Image Modeling
arXiv 2025
Distribution Matching Variational AutoEncoder
arXiv 2025
HunyuanOCR Technical Report
arXiv 2025
Optimal Stepsize for Diffusion Sampling
arXiv 2025
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging
arXiv 2025
Data-efficient Large Vision Models through Sequential Autoregression
arXiv 2024
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
arXiv 2024
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
arXiv 2024
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
ICCV 2023 1
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
CVPR 2023 1
A Survey on Video Diffusion Models
arXiv 2023
GlyphControl: Glyph Conditional Control for Visual Text Generation
glyphcontrol-glyph-conditional-control-for
Side Adapter Network for Open-Vocabulary Semantic Segmentation
CVPR 2023 1
Efficient Diffusion Training via Min-SNR Weighting Strategy
ICCV 2023 1
Segment and Caption Anything
CVPR 2024 1
DETR Doesn't Need Multi-Scale or Locality Design
arXiv 2023
MotionEditor: Editing Video Motion via Content-Aware Diffusion
CVPR 2024 1
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
ICCV 2023 1
Mask-Attention-Free Transformer for 3D Instance Segmentation
ICCV 2023 1
Multiple View Geometry Transformers for 3D Human Pose Estimation
CVPR 2024 1
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
ICCV 2023 1
V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection
arXiv 2023
Rank-DETR for High Quality Object Detection
rank-detr-for-high-quality-object-detection
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion
arXiv 2023
Tutel: Adaptive Mixture-of-Experts at Scale
arXiv 2022
DETRs with Hybrid Matching
CVPR 2023 1
ResFormer: Scaling ViTs with Multi-Resolution Training
CVPR 2023 1
Attentive Mask CLIP
ICCV 2023 1
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
arXiv 2022
Exploring Discrete Diffusion Models for Image Captioning
arXiv 2022
SimMIM: A Simple Framework for Masked Image Modeling
CVPR 2022 1
End-to-End Semi-Supervised Object Detection with Soft Teacher
ICCV 2021 10
Video Swin Transformer
CVPR 2022 1
Self-Supervised Learning with Swin Transformers
arXiv 2021
Aligning Pretraining for Detection via Object-Level Contrastive Learning
NeurIPS 2021 12
Global Context Networks
arXiv 2020
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning
CVPR 2021 1
RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder
NeurIPS 2020 12
Affiliations
Frequent co-authors
10from 48 papers