0

Yong Man Ro

Papers
20

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
20papers

Authored papers

20

MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models

arXiv 2026

2026

STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding

arXiv 2026

2026

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations

arXiv 2025

2025

MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens

mms-llama-efficient-llm-based-audio-visual

2025

Phantom of Latent for Large Language and Vision Models

arXiv 2024

2024

CoLLaVO: Crayon Large Language and Vision mOdel

arXiv 2024

2024

Long-Form Speech Generation with Spoken Language Models

arXiv 2024

2024

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

arXiv 2024

2024

TroL: Traversal of Layers for Large Language and Vision Models

arXiv 2024

2024

What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models

arXiv 2024

2024

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

arXiv 2024

2024

MoAI: Mixture of All Intelligence for Large Language and Vision Models

arXiv 2024

2024

SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models

arXiv 2024

2024

Are Vision-Language Models Truly Understanding Multi-vision Sensor?

arXiv 2024

2024

Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning

ICCV 2023 1

2023

Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression

CVPR 2023 1

2023

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

ICCV 2023 1

2023

Causal Unsupervised Semantic Segmentation

arXiv 2023

2023

Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck

distilling-robust-and-non-robust-features-in

2022

Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network

CVPR 2022 1

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 20 papers