0

Serena Yeung-Levy

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think

arXiv 2026

2026

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

arXiv 2026

2026

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

arXiv 2026

2026

WildTableBench: Benchmarking Multimodal Foundation Models on Table Understanding In the Wild

arXiv 2026

2026

Tool Verification for Test-Time Reinforcement Learning

arXiv 2026

2026

Video Action Differencing

arXiv 2025

2025

Temporal Preference Optimization for Long-Form Video Understanding

arXiv 2025

2025

Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models

arXiv 2025

2025

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

arXiv 2025

2025

TTRV: Test-Time Reinforcement Learning for Vision Language Models

arXiv 2025

2025

No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models

arXiv 2025

2025

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

CVPR 2025 1

2025

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

CVPR 2025 1

2025

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

CVPR 2025 1

2025

VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes

arXiv 2025

2025

Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning

arXiv 2025

2025

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

arXiv 2024

2024

Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

arXiv 2024

2024

Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data

arXiv 2024

2024

Why are Visually-Grounded Language Models Bad at Image Classification?

arXiv 2024

2024

Revisiting Active Learning in the Era of Vision Foundation Models

arXiv 2024

2024

μ-Bench: A Vision-Language Benchmark for Microscopy Understanding

arXiv 2024

2024

Describing Differences in Image Sets with Natural Language

CVPR 2024 1

2023

Viewpoint Textual Inversion: Discovering Scene Representations and 3D View Control in 2D Diffusion Models

arXiv 2023

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers