Yanfeng Wang
- Papers
- 59
Cite
Notes
Only stored in your browser.
Authored papers
59Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
arXiv 2026
EvoMaster: A Foundational Agent Framework for Building Evolving Autonomous Scientific Agents at Scale
arXiv 2026
Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach
arXiv 2026
AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization
arXiv 2026
SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding
arXiv 2026
Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility
arXiv 2026
Multi-Agent System for Comprehensive Soccer Understanding
arXiv 2025
ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification
arXiv 2025
SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding
arXiv 2025
One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution
arXiv 2025
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
arXiv 2025
Evolving Diagnostic Agents in a Virtual Clinical Environment
arXiv 2025
AWorld: Orchestrating the Training Recipe for Agentic AI
arXiv 2025
SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?
arXiv 2025
EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis
arXiv 2025
End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
arXiv 2025
Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach
arXiv 2025
RARE: Retrieval-Augmented Reasoning Modeling
arXiv 2025
VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation
arXiv 2025
MedS$^3$: Towards Medical Small Language Models with Self-Evolved Slow Thinking
arXiv 2025
WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages
arXiv 2025
FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data
arXiv 2025
VocalBench: Benchmarking the Vocal Conversational Abilities for Speech Interaction Models
arXiv 2025
MatchTime: Towards Automatic Soccer Game Commentary Generation
arXiv 2024
RaTEScore: A Metric for Radiology Report Generation
arXiv 2024
Towards Universal Soccer Video Understanding
CVPR 2025 1
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
CVPR 2025 1
Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents
CVPR 2024 1
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
arXiv 2024
An Extensible Framework for Open Heterogeneous Collaborative Perception
arXiv 2024
Towards Evaluating and Building Versatile Large Language Models for Medicine
arXiv 2024
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation
arXiv 2024
Underwater Camouflaged Object Tracking Meets Vision-Language SAM2
arXiv 2024
MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts
arXiv 2024
A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis
arXiv 2024
Towards Building Multilingual Language Model for Medicine
arXiv 2024
ReMamber: Referring Image Segmentation with Mamba Twister
arXiv 2024
MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities
ICCV 2025
Low-Rank Knowledge Decomposition for Medical Foundation Models
CVPR 2024 1
HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning
arXiv 2024
CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios
arXiv 2024
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
arXiv 2024
MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception
arXiv 2024
Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data
arXiv 2023
PMC-LLaMA: Towards Building Open-source Language Models for Medicine
arXiv 2023
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
arXiv 2023
One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts
arXiv 2023
DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration
CVPR 2023 1
FedDisco: Federated Learning with Discrepancy-Aware Collaboration
arXiv 2023
Zero-shot Composed Text-Image Retrieval
arXiv 2023
AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation
open-vocabulary-semantic-segmentation-via
Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
arXiv 2023
PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents
arXiv 2023
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models
arXiv 2023
Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction
ICCV 2023 1
Joint-Relation Transformer for Multi-Person Motion Prediction
ICCV 2023 1
Boost Video Frame Interpolation via Motion Adaptation
arXiv 2023
K-Space Transformer for Undersampled MRI Reconstruction
arXiv 2022
Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models
arXiv 2022
Affiliations
Frequent co-authors
10from 59 papers