Zilong Huang
- Papers
- 23
Cite
Notes
Only stored in your browser.
Authored papers
23Let ViT Speak: Generative Language-Image Pre-training
arXiv 2026
EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation
arXiv 2026
Mixture-of-Depths Attention
arXiv 2026
Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation
arXiv 2026
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
arXiv 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
arXiv 2025
Seed1.5-VL Technical Report
arXiv 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
ICCV 2025
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
arXiv 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
ICCV 2025
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
arXiv 2025
ThinkGen: Generalized Thinking for Visual Generation
arXiv 2025
MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts
arXiv 2025
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
arXiv 2025
DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World
arXiv 2025
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
arXiv 2025
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
arXiv 2025
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration
CVPR 2025 1
Depth Anything V2
arXiv 2024
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
CVPR 2024 1
Classification Done Right for Vision-Language Pre-Training
arXiv 2024
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
arXiv 2024
CCNet: Criss-Cross Attention for Semantic Segmentation
ccnet-criss-cross-attention-for-semantic-1
Affiliations
Frequent co-authors
10from 23 papers