Hang Zhang

Papers: 25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

25papers

Authored papers

Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

arXiv 2026

2026

Qwen-Image Technical Report

arXiv 2025

2025

Qwen2.5-VL Technical Report

arXiv 2025

2025

Qwen3-VL Technical Report

arXiv 2025

2025

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

arXiv 2025

2025

The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs

arXiv 2025

2025

SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation

arXiv 2025

2025

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

arXiv 2025

2025

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

arXiv 2025

2025

Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow Systems

arXiv 2025

2025

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

arXiv 2024

2024

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

arXiv 2024

2024

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

arXiv 2024

2024

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

CVPR 2025 1

2024

SVIPTR: Fast and Efficient Scene Text Recognition with Vision Permutable Extractor

arXiv 2024

2024

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

arXiv 2024

2024

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

arXiv 2023

2023

SeaLLMs -- Large Language Models for Southeast Asia

arXiv 2023

2023

AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators

arXiv 2023

2023

APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning

arXiv 2022

2022

Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

CVPR 2023 1

2022

Adversarial Retriever-Ranker for dense text retrieval

adversarial-retriever-ranker-for-dense-text-1

2021

AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

arXiv 2020

2020

ResNeSt: Split-Attention Networks

arXiv 2020

2020

Bag of Freebies for Training Object Detection Neural Networks

arXiv 2019

2019

Affiliations

No known affiliations.

Frequent co-authors

from 25 papers

Xin Li

Lidong Bing

Zesen Cheng

Deli Zhao

Sicong Leng

Wenqi Zhang

Yueting Zhuang

Zhiqiang Hu

Boqiang Zhang

Guanzheng Chen