AI benchmarking: Speech-to-Text

A tool designed for simplicity, automation and minimal preparation of test data, developed to meet the needs of broadcasters.

In response to demands from broadcasters and other organizations that process large volumes of A/V content, the EBU has launched and coordinates a framework for benchmarking Artificial Intelligence (AI) and Machine Learning (ML) services. The project is spearheaded by ‘BenchmarkSTT’, a tool designed to facilitate the benchmarking of speech-to-text systems and services.



  • status_med_12px.png Development of a facial recognition system for video
  • ​​​​​​​status_med_12px.png Development of metrics to evaluate facial recognition systems for video
  • status_med_12px.png Deploy an Open API for facial recognition to benchmark state of the art systems
  • status_med_12px.png Write a report on best practices and state of the art on facial recognition




Unlike tools used by ML experts in academic settings, BenchmarkSTT targets non-specialists in production environments. It does not require meticulous preparation of test data, and it prioritises simplicity, automation and relative ranking over scientific precision and absolute scores.

With a single command, the tool calculates the accuracy of Automatic Speech Recognition (ASR) transcripts against a reference. Optionally, the user can apply normalization rules to remove non-significant differences such as case or punctuation. Supporting multiple languages and user-defined normalizations, this CLI tool can be integrated into production workflows to perform real-time benchmarking.

Open Source

This collaborative project is open source.


Contributors and users of the opensource 'STT Benchmarking' explain the tool's principles, useful metrics and applications.

The second part of the webinar addresses developers and provides an overview of the code and guidance for its integration. 


Related topics


The EBU helps Members to enhance and enrich their media by integrating their data from commissioning to distribution taking benefit in advances on Artificial Intelligence.

Products Kaizen

Broadcasters deliver their content in digital products. Crafting these products is cross-diciplinary work where content, technology, design and an understanding of the user's needs come together.


How Radio, TV and other media services can be consumed on a range of new platforms and hybrid connected devices. It focuses on the development of applications above layer 3 (application, presentation) as well as supporting services to deliver the best experience to the end user on any device and at any time.