Youngjae Yu
- Papers
- 33
Cite
Notes
Only stored in your browser.
Authored papers
33MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation
arXiv 2025
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
arXiv 2025
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
arXiv 2025
Representation Bending for Large Language Model Safety
arXiv 2025
Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers
arXiv 2025
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
arXiv 2025
Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games
arXiv 2025
G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness
arXiv 2025
Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?
arXiv 2025
Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory
arXiv 2024
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
ICCV 2025
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction
arXiv 2024
TIPO: Text to Image with Text Presampling for Prompt Optimization
arXiv 2024
Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
arXiv 2024
ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO
arXiv 2024
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding
arXiv 2024
ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos
arXiv 2024
Aligning Large Language Models by On-Policy Self-Judgment
arXiv 2024
Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation
arXiv 2024
Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset
arXiv 2024
Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!
arXiv 2024
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
multimodal-c4-an-open-billion-scale-corpus-of
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
ICCV 2023 1
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
arXiv 2023
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step
arXiv 2023
Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents
arXiv 2023
CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents
arXiv 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
localized-symbolic-knowledge-distillation-for
ProsocialDialog: A Prosocial Backbone for Conversational Agents
arXiv 2022
Multimodal Knowledge Alignment with Reinforcement Learning
arXiv 2022
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization
arXiv 2022
NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics
NAACL 2022 7
Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer
arXiv 2021
Affiliations
Frequent co-authors
10from 33 papers