To mark the opening of nominations for the EBU Technology & Innovation Awards 2021, we're looking back at the projects honoured in 2020. NPO's Autotitling Live project was one of three runners-up for the T&I Award 2020. This article first appeared in Issue 46 of tech-i magazine.
Erik Buitinga (NPO)
NPO Access Services provides accessibility services, including SDH (subtitles for the deaf or hard of hearing) for seven television channels and the NPO Start streaming platform. We have quite a history when it comes to applying speech technology to subtitle production. The 2002 Winter Olympics marked the full-scale introduction of speech recognition software for live subtitling and the kick-off for re-speaking as the single mode of production for both live and pre-prepared content.
In 2009 NPO participated in the NEON project. The aim was to run video through speech-to-text, speaker diarization and alignment processes and present the outcome in a UI as editable subtitles. As a demo, NEON was a success; as a production tool it was nowhere near competing with traditional modes of subtitling and their throughput times. But from 2012 onwards we deemed automatic speech recognition (ASR) technology mature enough to deploy ‘assisted subtitling’ – ASR-based subtitle production – on applicable content only.
Until recently NPO employed the now defunct eCaption service. It provided a web-based authoring platform for ASR- generated subtitles and a fully automated subtitle workflow for pre-prepared content servicing political news channel NPO Politiek. (This was presented at the EBU MDN Workshop 2018.)
When NPO put out a tender for Autotitling in 2018 we knew that, as a channel, NPO Politiek had the potential to take the next step, to fully automated subtitle delivery. The content was very ASR-friendly: excellent audio quality and professional speakers to boost transcription quality, predictable subject matter allowing lexicon customization, and large amounts of training data for machine learning (ML) purposes.
We also knew this next step would be as controversial as it was ambitious. The accessibility community questions automatic subtitling, as it clearly fails to meet professional standards at times. On the consumer side, surveys indicated our target audience could tolerate automatic subtitling for some time, but not all the time, and only if it had sufficient quality (i.e. low on word errors and in-sync with audio). So, we were in for a treat.
We involved both groups, editors and audience, in validating and ranking examples from different vendors at different stages of the project. Their voice was not decisive, but as a preference it was authoritative. Acceptance was the keyword. However, all solutions offered were unacceptable. Only after introducing a fixed five-second delay on the broadcast signal, enabling in-sync subtitle delivery, were results rated as adequate.
Eventually, the Spraaklab solution came out as a ‘winner’ offering 24/7 automated subtitling on NPO Politiek. It runs as a redundant system in our media gateway hosted by local service provider MyBit. Input is SDI video, output is subtitles in the Cyclone, Newfor and EBU TT-live protocols. The system is tunable.
Since its implementation in October 2019 the system has been enhanced with a punctuation model and an ML loop was set up using our own datasets.
What can we say after one year of service? It is not flawless, but it is pretty neat, and it will only get better.