0

Lin Ma

Papers
32

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
32papers

Authored papers

32

Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction

arXiv 2026

2026

Stereo World Model: Camera-Guided Stereo Video Generation

arXiv 2026

2026

OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

arXiv 2026

2026

ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

arXiv 2026

2026

FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction

arXiv 2025

2025

DisTime: Distribution-based Time Representation for Video Large Language Models

ICCV 2025

2025

UItron: Foundational GUI Agent with Advanced Perception and Planning

arXiv 2025

2025

UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding

arXiv 2025

2025

Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning

arXiv 2025

2025

DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data

arXiv 2025

2025

3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance

arXiv 2024

2024

LESS: Label-Efficient and Single-Stage Referring 3D Segmentation

arXiv 2024

2024

A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation

arXiv 2024

2024

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

arXiv 2024

2024

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

arXiv 2024

2024

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

arXiv 2024

2024

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

arXiv 2024

2024

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

arXiv 2024

2024

Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network

ICCV 2023 1

2023

E2E-LOAD: End-to-End Long-form Online Action Detection

ICCV 2023 1

2023

A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

arXiv 2023

2023

A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text

arXiv 2023

2023

SoccerNet 2023 Challenges Results

arXiv 2023

2023

DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation

arXiv 2023

2023

LMEye: An Interactive Perception Network for Large Language Models

arXiv 2023

2023

TriDet: Temporal Action Detection with Relative Boundary Modeling

CVPR 2023 1

2023

Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation

arXiv 2022

2022

Weakly Supervised Semantic Segmentation via Progressive Patch Learning

arXiv 2022

2022

SoccerNet 2022 Challenges Results

arXiv 2022

2022

PromptDet: Towards Open-vocabulary Detection using Uncurated Images

arXiv 2022

2022

ReAct: Temporal Action Detection with Relational Queries

arXiv 2022

2022

Similarity Reasoning and Filtration for Image-Text Matching

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 32 papers