0

Youngjae Yu

Papers
33

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
33papers

Authored papers

33

MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation

arXiv 2025

2026

Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation

arXiv 2025

2025

D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI

arXiv 2025

2025

Representation Bending for Large Language Model Safety

arXiv 2025

2025

Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers

arXiv 2025

2025

VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms

arXiv 2025

2025

Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games

arXiv 2025

2025

G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness

arXiv 2025

2025

Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?

arXiv 2025

2025

Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

arXiv 2024

2024

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

ICCV 2025

2024

CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

arXiv 2024

2024

TIPO: Text to Image with Text Presampling for Prompt Optimization

arXiv 2024

2024

Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback

arXiv 2024

2024

ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO

arXiv 2024

2024

Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

arXiv 2024

2024

ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos

arXiv 2024

2024

Aligning Large Language Models by On-Policy Self-Judgment

arXiv 2024

2024

Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation

arXiv 2024

2024

Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset

arXiv 2024

2024

Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!

arXiv 2024

2024

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

multimodal-c4-an-open-billion-scale-corpus-of

2023

CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

ICCV 2023 1

2023

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

arXiv 2023

2023

Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step

arXiv 2023

2023

Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents

arXiv 2023

2023

CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents

arXiv 2023

2023

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

localized-symbolic-knowledge-distillation-for

2023

ProsocialDialog: A Prosocial Backbone for Conversational Agents

arXiv 2022

2022

Multimodal Knowledge Alignment with Reinforcement Learning

arXiv 2022

2022

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

arXiv 2022

2022

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics

NAACL 2022 7

2021

Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 33 papers