Wanrong Zhu

Papers: 13

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

13papers

Authored papers

GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding

arXiv 2025

2025

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

arXiv 2024

2024

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

arXiv 2024

2024

MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding

arXiv 2024

2024

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

arXiv 2023

2023

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

multimodal-c4-an-open-billion-scale-corpus-of

2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

NeurIPS 2023 11

2023

Multimodal Procedural Planning via Dual Text-Image Prompting

arXiv 2023

2023

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

arXiv 2023

2023

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

arXiv 2023

2023

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

arXiv 2023

2023

Imagination-Augmented Natural Language Understanding

NAACL 2022 7

2022

Text Infilling

arXiv 2019

2019

Affiliations

No known affiliations.

Frequent co-authors

from 13 papers

William Yang Wang

7 shared papers

Xin Eric Wang

4 shared papers

Anas Awadalla

3 shared papers

Jack Hessel

researcher

JianFeng Wang

Kevin Lin

Lijuan Wang

Linjie Li

Ludwig Schmidt

professor

3 shared papers

Weixi Feng

3 shared papers