0

Gao Huang

Papers
51

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
51papers

Authored papers

51

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

arXiv 2026

2026

InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation

arXiv 2026

2026

Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

arXiv 2026

2026

Linear-Time Global Visual Modeling without Explicit Attention

arXiv 2026

2026

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

arXiv 2025

2025

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

CVPR 2025 1

2025

EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance

CVPR 2025 1

2025

CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning

CVPR 2025 1

2025

DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation

arXiv 2025

2025

Few-Step Distillation for Text-to-Image Generation: A Practical Guide

arXiv 2025

2025

MOVE: A Simple Motion-Based Data Collection Paradigm for Spatial Generalization in Robotic Manipulation

arXiv 2025

2025

IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance

arXiv 2025

2025

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

arXiv 2025

2025

Differential Transformer

arXiv 2024

2024

Frequency-aware Feature Fusion for Dense Image Prediction

arXiv 2024

2024

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

arXiv 2024

2024

DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution

arXiv 2024

2024

DyFADet: Dynamic Feature Aggregation for Temporal Action Detection

arXiv 2024

2024

COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing

arXiv 2024

2024

DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints

arXiv 2024

2024

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

arXiv 2024

2024

Bridging the Divide: Reconsidering Softmax and Linear Attention

arXiv 2024

2024

Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators

arXiv 2024

2024

Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment

CVPR 2025 1

2024

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

arXiv 2024

2024

ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis

arXiv 2024

2024

ExpeL: LLM Agents Are Experiential Learners

arXiv 2023

2023

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

arXiv 2023

2023

DAT++: Spatially Dynamic Vision Transformer with Deformable Attention

arXiv 2023

2023

Agent Attention: On the Integration of Softmax and Linear Attention

arXiv 2023

2023

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

CVPR 2024 1

2023

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

CVPR 2024 1

2023

Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning

train-once-get-a-family-state-adaptive

2023

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process

arXiv 2023

2023

FLatten Transformer: Vision Transformer using Focused Linear Attention

ICCV 2023 1

2023

Adaptive Rotated Convolution for Rotated Object Detection

ICCV 2023 1

2023

Rank-DETR for High Quality Object Detection

rank-detr-for-high-quality-object-detection

2023

Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels

arXiv 2023

2023

Dynamic Perceiver for Efficient Visual Recognition

ICCV 2023 1

2023

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

CVPR 2023 1

2022

Deep Incubation: Training Large Models by Divide-and-Conquering

ICCV 2023 1

2022

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

CVPR 2022 1

2022

Domain Adaptation via Prompt Learning

arXiv 2022

2022

A Mixture of Surprises for Unsupervised Reinforcement Learning

arXiv 2022

2022

EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones

ICCV 2023 1

2022

SePiCo: Semantic-Guided Pixel Contrast for Domain Adaptive Semantic Segmentation

arXiv 2022

2022

Generalized Domain Conditioned Adaptation Network

arXiv 2021

2021

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

CVPR 2022 1

2021

Rethinking the Value of Network Pruning

rethinking-the-value-of-network-pruning-1

2018

Densely Connected Convolutional Networks

densely-connected-convolutional-networks-1

2016

Deep Networks with Stochastic Depth

arXiv 2016

2016

Affiliations

No known affiliations.

Frequent co-authors

10

from 51 papers