0

Tong Lu

Papers
26

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
26papers

Authored papers

26

Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

arXiv 2026

2026

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

arXiv 2025

2025

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

arXiv 2025

2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

arXiv 2025

2025

Sequential Diffusion Language Models

arXiv 2025

2025

Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision

arXiv 2025

2025

AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs

arXiv 2025

2025

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

arXiv 2024

2024

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

arXiv 2024

2024

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

CVPR 2024 1

2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

arXiv 2024

2024

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

arXiv 2024

2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

arXiv 2024

2024

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

arXiv 2024

2024

MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

arXiv 2024

2024

FB-BEV: BEV Representation from Forward-Backward View Transformations

ICCV 2023 1

2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

NeurIPS 2023 11

2023

DDP: Diffusion Model for Dense Visual Prediction

ICCV 2023 1

2023

Memory-and-Anticipation Transformer for Online Action Understanding

ICCV 2023 1

2023

GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

arXiv 2023

2023

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

CVPR 2023 1

2022

Deep Face Restoration: A Survey

arXiv 2022

2022

Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method

arXiv 2022

2022

PVT v2: Improved Baselines with Pyramid Vision Transformer

arXiv 2021

2021

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

CVPR 2022 1

2021

FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 26 papers