Feilong Tang

Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding

arXiv 2026

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

arXiv 2026

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

arXiv 2026

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

arXiv 2025

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

arXiv 2025

Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process

arXiv 2025

Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation

CVPR 2025 1

TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification

arXiv 2025

VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation

arXiv 2024

OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining

ICCV 2025

OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

arXiv 2024

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

CVPR 2025 1

Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations

arXiv 2024