Weijie Su
- Papers
- 15
Cite
Notes
Only stored in your browser.
Authored papers
15Multimodal OCR: Parse Anything from Documents
arXiv 2026
Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers
arXiv 2026
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
arXiv 2025
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
arXiv 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
arXiv 2025
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
arXiv 2025
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
arXiv 2025
CoMemo: LVLMs Need Image Context with Image Memory
arXiv 2025
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
CVPR 2025 1
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
arXiv 2024
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
arXiv 2023
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
CVPR 2023 1
Deformable DETR: Deformable Transformers for End-to-End Object Detection
deformable-detr-deformable-transformers-for
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
ICLR 2020 1
Affiliations
Frequent co-authors
10from 15 papers