Lin Ma
- Papers
- 32
Cite
Notes
Only stored in your browser.
Authored papers
32Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction
arXiv 2026
Stereo World Model: Camera-Guided Stereo Video Generation
arXiv 2026
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
arXiv 2026
ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?
arXiv 2026
FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction
arXiv 2025
DisTime: Distribution-based Time Representation for Video Large Language Models
ICCV 2025
UItron: Foundational GUI Agent with Advanced Perception and Planning
arXiv 2025
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
arXiv 2025
Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
arXiv 2025
DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data
arXiv 2025
3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
arXiv 2024
LESS: Label-Efficient and Single-Stage Referring 3D Segmentation
arXiv 2024
A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation
arXiv 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
arXiv 2024
OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
arXiv 2024
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
arXiv 2024
DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
arXiv 2024
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
arXiv 2024
Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
ICCV 2023 1
E2E-LOAD: End-to-End Long-form Online Action Detection
ICCV 2023 1
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
arXiv 2023
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text
arXiv 2023
SoccerNet 2023 Challenges Results
arXiv 2023
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
arXiv 2023
LMEye: An Interactive Perception Network for Large Language Models
arXiv 2023
TriDet: Temporal Action Detection with Relative Boundary Modeling
CVPR 2023 1
Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation
arXiv 2022
Weakly Supervised Semantic Segmentation via Progressive Patch Learning
arXiv 2022
SoccerNet 2022 Challenges Results
arXiv 2022
PromptDet: Towards Open-vocabulary Detection using Uncurated Images
arXiv 2022
ReAct: Temporal Action Detection with Relational Queries
arXiv 2022
Similarity Reasoning and Filtration for Image-Text Matching
arXiv 2021
Affiliations
Frequent co-authors
10from 32 papers