0

Yong Jae Lee

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

Your Embedding Model is SMARTer Than You Think

arXiv 2026

2026

Exploration and Exploitation Errors Are Measurable for Language Model Agents

arXiv 2026

2026

MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models

arXiv 2026

2026

Reasoning-Augmented Representations for Multimodal Retrieval

arXiv 2026

2026

Unified Spatio-Temporal Token Scoring for Efficient Video VLMs

arXiv 2026

2026

Relational Visual Similarity

arXiv 2025

2025

Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

arXiv 2025

2025

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

arXiv 2025

2025

UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios

arXiv 2025

2025

CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems

ICCV 2025

2025

LLM Inference Unveiled: Survey and Roofline Model Insights

arXiv 2024

2024

Yo'LLaVA: Your Personalized Language and Vision Assistant

arXiv 2024

2024

Matryoshka Multimodal Models

arXiv 2024

2024

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

arXiv 2024

2024

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

arXiv 2024

2024

CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

arXiv 2024

2024

VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

arXiv 2024

2024

Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos

vinoground-scrutinizing-lmms-over-dense

2024

GLIGEN: Open-Set Grounded Text-to-Image Generation

CVPR 2023 1

2023

Interfacing Foundation Models' Embeddings

arXiv 2023

2023

Visual Instruction Inversion: Image Editing via Visual Prompting

arXiv 2023

2023

Generalized Decoding for Pixel, Image, and Language

CVPR 2023 1

2022

Progressive Temporal Feature Alignment Network for Video Inpainting

CVPR 2021 1

2021

YOLACT++: Better Real-time Instance Segmentation

arXiv 2019

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers