Hao Jiang
- Papers
- 27
Cite
Notes
Only stored in your browser.
Authored papers
27Towards Customized Multimodal Role-Play
arXiv 2026
Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO
arXiv 2026
LFPO: Likelihood-Free Policy Optimization for Masked Diffusion Models
arXiv 2026
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
arXiv 2025
Fast-Slow Thinking for Large Vision-Language Model Reasoning
arXiv 2025
Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation
arXiv 2025
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
arXiv 2025
DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion
arXiv 2025
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
arXiv 2025
Towards Universal Soccer Video Understanding
CVPR 2025 1
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
CVPR 2025 1
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
arXiv 2024
Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation
arXiv 2024
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance
arXiv 2024
CursorCore: Assist Programming through Aligning Anything
arXiv 2024
Pyramidal Flow Matching for Efficient Video Generative Modeling
arXiv 2024
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
arXiv 2024
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
arXiv 2024
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
arXiv 2024
A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications
arXiv 2024
Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
arXiv 2024
DoNet: Deep De-overlapping Network for Cytology Instance Segmentation
CVPR 2023 1
Multi-Modal Experience Inspired AI Creation
arXiv 2022
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
NAACL 2022 7
Ego4D: Around the World in 3,000 Hours of Egocentric Video
CVPR 2022 1
Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking
arXiv 2021
Pre-training for Ad-hoc Retrieval: Hyperlink is Also You Need
arXiv 2021
Affiliations
Frequent co-authors
10from 27 papers