0

Ming Hu

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

arXiv 2026

2026

Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding

arXiv 2026

2026

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

arXiv 2026

2026

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

arXiv 2026

2026

A General Model for Retinal Segmentation and Quantification

arXiv 2026

2026

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

arXiv 2025

2025

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

arXiv 2025

2025

MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment

arXiv 2025

2025

ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation

arXiv 2025

2025

GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning

arXiv 2025

2025

Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model

arXiv 2025

2025

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

arXiv 2025

2025

Unimedvl: Unifying Medical Multimodal Understanding And Generation Through Observation-Knowledge-Analysis

arXiv 2025

2025

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

arXiv 2025

2025

SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines

arXiv 2025

2025

Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology

ICCV 2025

2025

TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification

arXiv 2025

2025

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

arXiv 2025

2025

OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining

ICCV 2025

2024

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

arXiv 2024

2024

OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

arXiv 2024

2024

Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations

arXiv 2024

2024

LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition

arXiv 2023

2023

HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding

hgclip-exploring-vision-language-models-with

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers