January 2021 saw the official launch of the Eurovision News Monitoring tool, the outcome of an innovative digital news collaborative project that gives journalists almost real-time access, in their own language, to news published across the EBU membership. Sébastien Noir, Head of Software Engineering in the EBU Technology & Innovation Department, describes the main technical building blocks of the tool.
We designed the news monitoring tool in close consultation with editors and journalists, incorporating their feedback directly in the development. The goal is a tool that is easy to use, requires no training, and provides immediate value to the user.
Flexibility is an essential quality of the project. Every EBU Member has its own tools and workflows. We designed the pilot with solutions for all: content can be pushed to the tool’s API at time of publication, or the tool can retrieve the newest articles every few minutes on the Member’s REST API.
Once articles are ingested into the system, they are converted to a standard format: every article is represented as a list of sections (title, lead, headings, paragraphs, images, media, quotes). The text component of every section can then be translated. The audio and video files are downloaded and sent to EuroVOX for transcription and translation.
As soon as the text translation is available, the article becomes visible in the web application. This process takes a matter of minutes. The journalists therefore have access to the news almost in real-time.
The translation of content is handled by EuroVOX. Developed by the EBU and its Members, it is an open toolbox for transcription, translation and revoicing. It provides a single unified interface to language tools from the major cloud platforms (AWS, Microsoft Azure, Google, Speechmatics, DeepL, etc) alongside the ability to plug in smaller vendors and custom models. It also provides a web application that enables easy transcription and translation of content.
Journalists and editors can review the steps taken: transcriptions can be corrected both in terms of text and timings to make the content more accessible. Once done this makes for a better translation, although the translation too can be edited if required. Finally, a video can be rendered from the translation using a synthetically voiced presenter, and optionally with either burned-in or embedded subtitles.
PEACH, also developed by the EBU and its Members, uses machine learning and AI to provide ‘related article’ recommendations. It works by converting each article into a vector representation using pre-trained deep-learning language models. The goal is that similar articles (semantically similar, even if different words are used) would be represented by vectors that are similar.
The vectors used may have more than 100 dimensions to capture the semantics of sentences. To find similar articles to a given article, we look up all vectors representing each article present in the hub and select the closest in the multi-dimensional space.
PEACH can also make use of user data; approaches like collaborative filtering will be taken later, when the content is shown to end users.
The next iterations of the tool will provide automatic extraction of topics from the articles. These will be used to surface trending topics in real-time, so that journalists can quickly identify what is happening now, from the perspective of EBU newsrooms. The user experience will also evolve, to permit a more focused exploration of the diverse content produced by European PSM.
This article was first published in issue 47 of tech-i magazine.