ML Data Pool

Building collective datasets for Machine Learning


  • status_done_12px.png  Contact the EBU Members and identify their potential contributions (Q4 2020)
  • status_done_12px.png Identify any legal aspects and the project organisation (Q4 2020)
  • status_done_12px.png Build the project plan (Q4 2020)  

Machine Learning (ML) projects require large amounts of data for training. However, individual broadcasters’ libraries are usually too small. By pooling this the data and contextual knowledge, EBU Members can collectively build sufficiently large, and more diverse, datasets. This benefits all participants. 


  • Building collective datasets to train Machine Learning Models
  • Building collective datasets  to benchmark AI applications
  • Sharing resources and software to perform the data cleaning
  • Sharing the models trained on collective datasets
  • Building a database of benchmarking results 


EBU Members can join this group to contribute or keep in touch.

Contributions may include:

1. sharing data from your archives
2. helping to gather data in other ways, such as web scraping
3. contributing information, such as a list of local public personalities
4. helping to develop software to normalize and clean the collected data
5. helping to define the database’s architecture

Project organisation

This project is chaired by Swiss-French public service broadcaster RTS. The EBU hosts the project, data and software.

Related topics

AI benchmarking