Gao Huang
- Papers
- 51
Cite
Notes
Only stored in your browser.
Authored papers
51The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
arXiv 2026
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation
arXiv 2026
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models
arXiv 2026
Linear-Time Global Visual Modeling without Explicit Attention
arXiv 2026
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
arXiv 2025
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
CVPR 2025 1
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
CVPR 2025 1
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
CVPR 2025 1
DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation
arXiv 2025
Few-Step Distillation for Text-to-Image Generation: A Practical Guide
arXiv 2025
MOVE: A Simple Motion-Based Data Collection Paradigm for Spatial Generalization in Robotic Manipulation
arXiv 2025
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
arXiv 2025
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning
arXiv 2025
Differential Transformer
arXiv 2024
Frequency-aware Feature Fusion for Dense Image Prediction
arXiv 2024
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
arXiv 2024
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
arXiv 2024
DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
arXiv 2024
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing
arXiv 2024
DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints
arXiv 2024
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
arXiv 2024
Bridging the Divide: Reconsidering Softmax and Linear Attention
arXiv 2024
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
arXiv 2024
Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment
CVPR 2025 1
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing
arXiv 2024
ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
arXiv 2024
ExpeL: LLM Agents Are Experiential Learners
arXiv 2023
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
arXiv 2023
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
arXiv 2023
Agent Attention: On the Integration of Softmax and Linear Attention
arXiv 2023
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
CVPR 2024 1
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
CVPR 2024 1
Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning
train-once-get-a-family-state-adaptive
ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
arXiv 2023
FLatten Transformer: Vision Transformer using Focused Linear Attention
ICCV 2023 1
Adaptive Rotated Convolution for Rotated Object Detection
ICCV 2023 1
Rank-DETR for High Quality Object Detection
rank-detr-for-high-quality-object-detection
Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels
arXiv 2023
Dynamic Perceiver for Efficient Visual Recognition
ICCV 2023 1
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
CVPR 2023 1
Deep Incubation: Training Large Models by Divide-and-Conquering
ICCV 2023 1
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
CVPR 2022 1
Domain Adaptation via Prompt Learning
arXiv 2022
A Mixture of Surprises for Unsupervised Reinforcement Learning
arXiv 2022
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones
ICCV 2023 1
SePiCo: Semantic-Guided Pixel Contrast for Domain Adaptive Semantic Segmentation
arXiv 2022
Generalized Domain Conditioned Adaptation Network
arXiv 2021
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
CVPR 2022 1
Rethinking the Value of Network Pruning
rethinking-the-value-of-network-pruning-1
Densely Connected Convolutional Networks
densely-connected-convolutional-networks-1
Deep Networks with Stochastic Depth
arXiv 2016
Affiliations
Frequent co-authors
10from 51 papers