0

Han Zhang

Papers
29

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
29papers

Authored papers

29

Masked Depth Modeling for Spatial Perception

arXiv 2026

2026

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

arXiv 2026

2026

PromptRL: Prompt Matters in RL for Flow-Based Image Generation

arXiv 2026

2026

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

arXiv 2026

2026

Image Diffusion Preview with Consistency Solver

arXiv 2025

2025

Step-Audio 2 Technical Report

arXiv 2025

2025

CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition

arXiv 2025

2025

EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models

arXiv 2025

2025

ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers

arXiv 2024

2024

FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation

arXiv 2024

2024

ALTER: Augmentation for Large-Table-Based Reasoning

arXiv 2024

2024

A Multi-Level Framework for Accelerating Training Transformer Models

arXiv 2024

2024

BatteryML:An Open-source platform for Machine Learning on Battery Degradation

arXiv 2023

2023

StoryBench: A Multifaceted Benchmark for Continuous Story Visualization

storybench-a-multifaceted-benchmark-for

2023

MAXIM: Multi-Axis MLP for Image Processing

CVPR 2022 1

2022

MAGVIT: Masked Generative Video Transformer

CVPR 2023 1

2022

MaskGIT: Masked Generative Image Transformer

CVPR 2022 1

2022

GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization

arXiv 2022

2022

FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation

ICCV 2023 1

2022

MaxViT: Multi-Axis Vision Transformer

arXiv 2022

2022

MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis

CVPR 2023 1

2022

DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning

arXiv 2022

2022

Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index

arXiv 2021

2021

ViTGAN: Training GANs with Vision Transformers

vitgan-training-gans-with-vision-transformers-1

2021

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

arXiv 2021

2021

ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding

NAACL 2021 4

2020

ERNIE: Enhanced Representation through Knowledge Integration

arXiv 2019

2019

Self-Attention Generative Adversarial Networks

self-attention-generative-adversarial-1

2018

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

arXiv 2017

2017

Affiliations

No known affiliations.

Frequent co-authors

10

from 29 papers