Let There Be Sound: Reconstructing High Quality Speech from Silent Videos

The goal of this work is to reconstruct high quality speech from lip motions alone, a task also known as lip-to-speech. A key challenge of lip-to-speech systems is the one-to-many mapping caused by (1) the existence of homophenes and (2) multiple speech variations, resulting in…

Open

Year: 2023
ArXiv: arxiv.org/abs/2308.15256
URL: arxiv.org/abs/2308.15256v2
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2308.15256v2
TL;DR: Semantic Scholar

Attribution policy →