Weijie Su

Papers: 15

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

15papers

Authored papers

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

arXiv 2026

2026

Multimodal OCR: Parse Anything from Documents

arXiv 2026

2026

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

arXiv 2025

2025

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

arXiv 2025

2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

arXiv 2025

2025

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

arXiv 2025

2025

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

arXiv 2025

2025

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

arXiv 2025

2025

CoMemo: LVLMs Need Image Context with Image Memory

arXiv 2025

2025

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

CVPR 2025 1

2024

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

arXiv 2024

2024

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

arXiv 2023

2023

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

CVPR 2023 1

2022

Deformable DETR: Deformable Transformers for End-to-End Object Detection

deformable-detr-deformable-transformers-for

2020

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

ICLR 2020 1

2019

Affiliations

No known affiliations.

Frequent co-authors

from 15 papers

Jifeng Dai

Xizhou Zhu

Wenhai Wang

Lewei Lu

Yu Qiao

Bin Li

Chenyu Yang

Jinguo Zhu

Weiyun Wang

Xuan Dong