Hexiang Hu
- Papers
- 7
Cite
Notes
Only stored in your browser.
7papers
Authored papers
7Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
arXiv 2024
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
arXiv 2024
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
arXiv 2024
Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?
arXiv 2023
From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
from-pixels-to-ui-actions-learning-to-follow
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
ICCV 2023 1
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
arXiv 2022
Affiliations
No known affiliations.
Frequent co-authors
10from 7 papers