Cite
Notes
Only stored in your browser.
Attribution
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
mms-llama-efficient-llm-based-audio-visual
Long-Form Speech Generation with Spoken Language Models
arXiv 2024
from 2 papers
Yong Man Ro
Aren Jansen
Hyeongseop Rha
Jeong Hun Yeo
Julian Salazar
Keisuke Kinoshita
RJ Skerry-Ryan