Serena Yeung-Levy
- Papers
- 24
Cite
Notes
Only stored in your browser.
Authored papers
24V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think
arXiv 2026
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
arXiv 2026
PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR
arXiv 2026
WildTableBench: Benchmarking Multimodal Foundation Models on Table Understanding In the Wild
arXiv 2026
Tool Verification for Test-Time Reinforcement Learning
arXiv 2026
Video Action Differencing
arXiv 2025
Temporal Preference Optimization for Long-Form Video Understanding
arXiv 2025
Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models
arXiv 2025
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
arXiv 2025
TTRV: Test-Time Reinforcement Learning for Vision Language Models
arXiv 2025
No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models
arXiv 2025
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
CVPR 2025 1
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
CVPR 2025 1
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research
CVPR 2025 1
VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes
arXiv 2025
Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
arXiv 2025
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
arXiv 2024
Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation
arXiv 2024
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data
arXiv 2024
Why are Visually-Grounded Language Models Bad at Image Classification?
arXiv 2024
Revisiting Active Learning in the Era of Vision Foundation Models
arXiv 2024
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding
arXiv 2024
Describing Differences in Image Sets with Natural Language
CVPR 2024 1
Viewpoint Textual Inversion: Discovering Scene Representations and 3D View Control in 2D Diffusion Models
arXiv 2023
Affiliations
Frequent co-authors
10from 24 papers