Yong Man Ro
- Papers
- 20
Cite
Notes
Only stored in your browser.
Authored papers
20MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models
arXiv 2026
STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding
arXiv 2026
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
arXiv 2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
mms-llama-efficient-llm-based-audio-visual
Phantom of Latent for Large Language and Vision Models
arXiv 2024
CoLLaVO: Crayon Large Language and Vision mOdel
arXiv 2024
Long-Form Speech Generation with Spoken Language Models
arXiv 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
arXiv 2024
TroL: Traversal of Layers for Large Language and Vision Models
arXiv 2024
What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models
arXiv 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
arXiv 2024
MoAI: Mixture of All Intelligence for Large Language and Vision Models
arXiv 2024
SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models
arXiv 2024
Are Vision-Language Models Truly Understanding Multi-vision Sensor?
arXiv 2024
Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning
ICCV 2023 1
Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression
CVPR 2023 1
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
ICCV 2023 1
Causal Unsupervised Semantic Segmentation
arXiv 2023
Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck
distilling-robust-and-non-robust-features-in
Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network
CVPR 2022 1
Affiliations
Frequent co-authors
10from 20 papers