Tong Lu
- Papers
- 26
Cite
Notes
Only stored in your browser.
Authored papers
26Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline
arXiv 2026
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
arXiv 2025
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models
arXiv 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
Sequential Diffusion Language Models
arXiv 2025
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision
arXiv 2025
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
arXiv 2025
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
arXiv 2024
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
arXiv 2024
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024 1
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
arXiv 2024
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
arXiv 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
arXiv 2024
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
arXiv 2024
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
arXiv 2024
FB-BEV: BEV Representation from Forward-Backward View Transformations
ICCV 2023 1
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
NeurIPS 2023 11
DDP: Diffusion Model for Dense Visual Prediction
ICCV 2023 1
Memory-and-Anticipation Transformer for Online Action Understanding
ICCV 2023 1
GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions
arXiv 2023
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
CVPR 2023 1
Deep Face Restoration: A Survey
arXiv 2022
Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method
arXiv 2022
PVT v2: Improved Baselines with Pyramid Vision Transformer
arXiv 2021
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers
CVPR 2022 1
FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation
arXiv 2021
Affiliations
Frequent co-authors
10from 26 papers