Yong Jae Lee
- Papers
- 24
Cite
Notes
Only stored in your browser.
Authored papers
24Your Embedding Model is SMARTer Than You Think
arXiv 2026
Exploration and Exploitation Errors Are Measurable for Language Model Agents
arXiv 2026
MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models
arXiv 2026
Reasoning-Augmented Representations for Multimodal Retrieval
arXiv 2026
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs
arXiv 2026
Relational Visual Similarity
arXiv 2025
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
arXiv 2025
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
arXiv 2025
UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios
arXiv 2025
CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems
ICCV 2025
LLM Inference Unveiled: Survey and Roofline Model Insights
arXiv 2024
Yo'LLaVA: Your Personalized Language and Vision Assistant
arXiv 2024
Matryoshka Multimodal Models
arXiv 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
arXiv 2024
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
arXiv 2024
CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples
arXiv 2024
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
arXiv 2024
Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
vinoground-scrutinizing-lmms-over-dense
GLIGEN: Open-Set Grounded Text-to-Image Generation
CVPR 2023 1
Interfacing Foundation Models' Embeddings
arXiv 2023
Visual Instruction Inversion: Image Editing via Visual Prompting
arXiv 2023
Generalized Decoding for Pixel, Image, and Language
CVPR 2023 1
Progressive Temporal Feature Alignment Network for Video Inpainting
CVPR 2021 1
YOLACT++: Better Real-time Instance Segmentation
arXiv 2019
Affiliations
Frequent co-authors
10from 24 papers