AI Benchmarking

Development of frameworks and tools for the evaluation of AI/ML applications.

This project group started with BenchmarkSTT, a tool to facilitate benchmarking of speech-to-text systems and services. Now, the working group is developing an open-source pipeline for facial recognition in videos. For comparative evaluation, the team has developed one of the largest datasets for Machine Learning of annotated videos with about 700 celebrities over 78 hours of TV content.

We are now working on experimenting with LLM and generative AI for media apllications

2025

  • status_med_12px.png Develop a POC on Retrieval-Augmented Generation (RAG) to explore state-of-the-art applications in news production. Write report of findings and recommendations for the News and Technical Committees

2024 

  • status_med_12px.png Evaluation of and experimentation  with  LLM/RAG/Agent for Q&A on owned content

2023 

  • status_done_12px.png Development of an open-source facial recognition system for video
  • status_done_12px.png Development of metrics to evaluate facial recognition systems for video
  • status_done_12px.png Organise a workshop on Facial Recognition evaluation strategy for broadcasters
  • status_done_12px.png Develop an annotated dataset to evaluate face recognition systems on tv programmes

2022

  • status_done_12px.png Development of a facial recognition system for images
  • status_done_12px.png Definition of metrics to evaluate facial recognition systems for video
  • status_done_12px.png Development of a semi-automatic annotation pipeline on AWS 
  • status_done_12px.png Start the writing of  a report on best practices and state of the art on facial recognition

2021

  • status_done_12px.png Publish BenchmarkSTT tool 1.1
  • status_done_12px.png Start the development of a facial recognition system for video

2020

  • status_done_12px.png Add the Levenshtein distance to the STT benchmarking code 
  • status_done_12px.png Test the STT benchmarking API and Docker image 
  • status_done_12px.png Publish STT benchmarking release 1.0.0 on PyPi 
  • status_done_12px.png Update the STT benchmarking documentation on ReadTheDocs 
  • status_done_12px.png Organise a Webinar
  • status_done_12px.png Develop the STT benchmarking new metrics for 1.1 on Github

 

BenchmarkSTT

Unlike tools used by ML experts in academic settings, BenchmarkSTT targets non-specialists in production environments. It does not require meticulous preparation of test data, and it prioritises simplicity, automation and relative ranking over scientific precision and absolute scores.

With a single command, the tool calculates the accuracy of Automatic Speech Recognition (ASR) transcripts against a reference. Optionally, the user can apply normalization rules to remove non-significant differences such as case or punctuation. Supporting multiple languages and user-defined normalizations, this CLI tool can be integrated into production workflows to perform real-time benchmarking.

Open Source

This collaborative project is open source.

Webinar

Contributors and users of the opensource 'STT Benchmarking' explain the tool's principles, useful metrics and applications.

The second part of the webinar addresses developers and provides an overview of the code and guidance for its integration. 

 

Related topics

Data

The EBU helps Members to enhance and enrich their media by integrating their data from commissioning to distribution taking benefit in advances on Artificial Intelligence.

Video Systems & Workflows

The prime group in the video domain, where many specifciations are created and maintained. Looks after Codecs, Video Monitors, HDR, Aspect Ratios, etc.

Media Cloud & Microservices

The Media Cloud and Microservices Architecture (MCMA) work provides code and best practices for the integration of processes in production workflows, including Artificial Intelligence in the cloud.