Lewei Lu
- Papers
- 38
Cite
Notes
Only stored in your browser.
Authored papers
38SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
arXiv 2026
EVA: Efficient Reinforcement Learning for End-to-End Video Agent
arXiv 2026
OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis
arXiv 2026
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
arXiv 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
arXiv 2025
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
arXiv 2025
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
arXiv 2025
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
arXiv 2025
Scaling Spatial Intelligence with Multimodal Foundation Models
arXiv 2025
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
arXiv 2025
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding
arXiv 2025
Visual Jigsaw Post-Training Improves MLLMs
arXiv 2025
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
arXiv 2025
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
CVPR 2025 1
Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMM
arXiv 2025
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding
arXiv 2025
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning
arXiv 2025
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
arXiv 2024
Needle In A Multimodal Haystack
arXiv 2024
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024 1
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
arXiv 2024
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
arXiv 2024
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
arXiv 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
arXiv 2024
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
CVPR 2025 1
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
arXiv 2024
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
arXiv 2024
Scene as Occupancy
ICCV 2023 1
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
arXiv 2023
ControlLLM: Augment Language Models with Tools by Searching on Graphs
arXiv 2023
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
arXiv 2023
ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
arXiv 2023
Planning-oriented Autonomous Driving
CVPR 2023 1
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
CVPR 2023 1
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
CVPR 2023 1
Demystify Transformers & Convolutions in Modern Image Deep Networks
arXiv 2022
Deformable DETR: Deformable Transformers for End-to-End Object Detection
deformable-detr-deformable-transformers-for
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
ICLR 2020 1
Affiliations
Frequent co-authors
10from 38 papers