Han Zhang
- Papers
- 29
Cite
Notes
Only stored in your browser.
Authored papers
29Masked Depth Modeling for Spatial Perception
arXiv 2026
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
arXiv 2026
PromptRL: Prompt Matters in RL for Flow-Based Image Generation
arXiv 2026
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
arXiv 2026
Image Diffusion Preview with Consistency Solver
arXiv 2025
Step-Audio 2 Technical Report
arXiv 2025
CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition
arXiv 2025
EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models
arXiv 2025
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers
arXiv 2024
FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation
arXiv 2024
ALTER: Augmentation for Large-Table-Based Reasoning
arXiv 2024
A Multi-Level Framework for Accelerating Training Transformer Models
arXiv 2024
BatteryML:An Open-source platform for Machine Learning on Battery Degradation
arXiv 2023
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
storybench-a-multifaceted-benchmark-for
MAXIM: Multi-Axis MLP for Image Processing
CVPR 2022 1
MAGVIT: Masked Generative Video Transformer
CVPR 2023 1
MaskGIT: Masked Generative Image Transformer
CVPR 2022 1
GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization
arXiv 2022
FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation
ICCV 2023 1
MaxViT: Multi-Axis Vision Transformer
arXiv 2022
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
CVPR 2023 1
DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning
arXiv 2022
Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index
arXiv 2021
ViTGAN: Training GANs with Vision Transformers
vitgan-training-gans-with-vision-transformers-1
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding
arXiv 2021
ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding
NAACL 2021 4
ERNIE: Enhanced Representation through Knowledge Integration
arXiv 2019
Self-Attention Generative Adversarial Networks
self-attention-generative-adversarial-1
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
arXiv 2017
Affiliations
Frequent co-authors
10from 29 papers