Cite
Notes
Only stored in your browser.
Attribution
Vision-Speech Models: Teaching Speech Models to Converse about Images
arXiv 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
Moshi: a speech-text foundation model for real-time dialogue
arXiv 2024
from 3 papers
Alexandre Défossez
Neil Zeghidour
Patrick Pérez
Amélie Royer
Edouard Grave
Gabriel de Marmiesse
Hervé Jégou
Manu Orsini
Moritz Böhle
Tom Labiausse