0

Lewei Lu

Papers
38

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
38papers

Authored papers

38

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

arXiv 2026

2026

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

arXiv 2026

2026

OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

arXiv 2026

2026

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

arXiv 2025

2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

arXiv 2025

2025

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces

arXiv 2025

2025

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

arXiv 2025

2025

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

arXiv 2025

2025

Scaling Spatial Intelligence with Multimodal Foundation Models

arXiv 2025

2025

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

arXiv 2025

2025

ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding

arXiv 2025

2025

Visual Jigsaw Post-Training Improves MLLMs

arXiv 2025

2025

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

arXiv 2025

2025

MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction

CVPR 2025 1

2025

Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMM

arXiv 2025

2025

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

arXiv 2025

2025

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

arXiv 2025

2025

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

arXiv 2024

2024

Needle In A Multimodal Haystack

arXiv 2024

2024

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

CVPR 2024 1

2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

arXiv 2024

2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

arXiv 2024

2024

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

arXiv 2024

2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

arXiv 2024

2024

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

CVPR 2025 1

2024

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

arXiv 2024

2024

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

arXiv 2024

2024

Scene as Occupancy

ICCV 2023 1

2023

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

arXiv 2023

2023

ControlLLM: Augment Language Models with Tools by Searching on Graphs

arXiv 2023

2023

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

arXiv 2023

2023

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process

arXiv 2023

2023

Planning-oriented Autonomous Driving

CVPR 2023 1

2022

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

CVPR 2023 1

2022

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

CVPR 2023 1

2022

Demystify Transformers & Convolutions in Modern Image Deep Networks

arXiv 2022

2022

Deformable DETR: Deformable Transformers for End-to-End Object Detection

deformable-detr-deformable-transformers-for

2020

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

ICLR 2020 1

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 38 papers