Ming Hu
- Papers
- 24
Cite
Notes
Only stored in your browser.
Authored papers
24LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence
arXiv 2026
Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding
arXiv 2026
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
arXiv 2026
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
arXiv 2026
A General Model for Retinal Segmentation and Quantification
arXiv 2026
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
arXiv 2025
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
arXiv 2025
MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment
arXiv 2025
ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation
arXiv 2025
GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning
arXiv 2025
Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model
arXiv 2025
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
arXiv 2025
Unimedvl: Unifying Medical Multimodal Understanding And Generation Through Observation-Knowledge-Analysis
arXiv 2025
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?
arXiv 2025
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines
arXiv 2025
Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology
ICCV 2025
TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification
arXiv 2025
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
arXiv 2025
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
ICCV 2025
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
arXiv 2024
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
arXiv 2024
Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations
arXiv 2024
LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition
arXiv 2023
HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
hgclip-exploring-vision-language-models-with
Affiliations
Frequent co-authors
10from 24 papers