Interactive In-Meeting Speaker Correction with Human Feedback

Most automatic speech processing systems operate in ``open loop'' mode without user feedback about who said what, yet human-in-the-loop workflows can potentially enable higher accuracy. We propose an LLM-assisted in-meeting speaker correction system that lets users fix speaker attribution errors through brief corrective feedback. After performing streaming ASR and diarization, the system presents concise LLM-generated summaries to help users identify important speaker errors, and it incorporates user feedback by updating the speaker-attributed transcript and adding online speaker enrollments. To make this workflow effective despite errors in speech processing, LLM analysis, and user feedback, we developed several mechanisms to identify the intended correction more precisely. Further, we built an LLM-driven user feedback simulation to evaluate the workflow reprodubilty and at scale. Applied to the AMI headset test set, our system substantially reduces the DER from a streaming baseline (Google ASR + ECAPA) by 31.99% and speaker substitution error by 52.68%.