Wanrong Zhu
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding
arXiv 2025
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
arXiv 2024
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding
arXiv 2024
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
arXiv 2024
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
arXiv 2023
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
NeurIPS 2023 11
Multimodal Procedural Planning via Dual Text-Image Prompting
arXiv 2023
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
arXiv 2023
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
multimodal-c4-an-open-billion-scale-corpus-of
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
arXiv 2023
VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
arXiv 2023
Imagination-Augmented Natural Language Understanding
NAACL 2022 7
Text Infilling
arXiv 2019
Affiliations
Frequent co-authors
10from 13 papers