Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

Streaming Speech-to-Text Translation (StreamST) requires producing translations concurrently with incoming speech under strict latency constraints, demanding models that balance low latency with high translation quality. Despite rapid progress, evaluation remains fragmented across existing frameworks, which make different assumptions about how systems operate -- for example, whether they process continuous speech or short pre-segmented audio, and whether they support output revision (retranslation) or not (incremental) during decoding. As a result, comparing systems fairly and consistently across studies remains challenging. SimulEval, the most widely used framework, reflects these limitations: it supports only incremental decoding, assumes short segmented inputs, and lacks a native support for system demonstrations. More broadly, existing alternatives address only subsets of evaluation and deployment needs, leaving no unified solution for benchmarking and interactive demonstration. To address this gap, we introduce simulstream, the first open-source framework for StreamST evaluation and demonstration. It supports both incremental and re-translation decoding on long-form speech, provides fine-grained logging for quality and latency evaluation, and includes an interactive web interface for real-time visualization and comparison.

Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

Topics

Abstract

Authors