Dongxu Li
- Papers
- 11
Cite
Notes
Only stored in your browser.
Authored papers
11GTA1: GUI Test-time Scaling Agent
arXiv 2025
ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks
arXiv 2025
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions
arXiv 2024
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
CVPR 2025 1
Aria: An Open Multimodal Native Mixture-of-Experts Model
arXiv 2024
Aria-UI: Visual Grounding for GUI Instructions
arXiv 2024
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
arXiv 2024
cosFormer: Rethinking Softmax in Attention
cosformer-rethinking-softmax-in-attention
The Devil in Linear Transformer
arXiv 2022
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
CVPR 2022 1
Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison
arXiv 2019
Affiliations
Frequent co-authors
10from 11 papers