Wenyi Hong
- Papers
- 14
Cite
Notes
Only stored in your browser.
Authored papers
14GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
arXiv 2026
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
arXiv 2025
UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation
arXiv 2025
WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation
arXiv 2025
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
arXiv 2024
CogVLM2: Visual Language Models for Image and Video Understanding
arXiv 2024
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
arXiv 2024
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
arXiv 2024
CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations
arXiv 2024
LVBench: An Extreme Long Video Understanding Benchmark
ICCV 2025
CogAgent: A Visual Language Model for GUI Agents
CVPR 2024 1
Relay Diffusion: Unifying diffusion process across resolutions for image synthesis
relay-diffusion-unifying-diffusion-process
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
arXiv 2022
CogView: Mastering Text-to-Image Generation via Transformers
NeurIPS 2021 12
Affiliations
Frequent co-authors
10from 14 papers