LLM Benchmarking Strategies

This workshop is organised by the EBU AI Benchmarking Group as part of the data-related activities of the Smart Media Production Strategic Programme.

Large language models, which form the basis of conversational tools, have attracted a lot of attention since the commercialisation of ChatGPT. Their evaluation and benchmarking is still an open question and there is a lot of research going on. The key point is that to generate a language model, LLMs are trained on a pretext task, such as finding hidden words in texts. However, assessing performance on this task does not necessarily reflect performance on more complex tasks such as content annotation, summarisation and natural language generation.

In this workshop, we will discuss the experiments conducted by broadcasters on LLMs and focus on the methodology to evaluate these models.

LLM Benchmarking Strategies

Presentations

The societal and ethical implications of Large Language Models

Overview of some LLM benchmarks

LLM use cases at BBC

LLM finetuning and benchmarking – a few examples

Upcoming EBU Events

LLM Benchmarking Strategies

Presentations

The societal and ethical implications of Large Language Models

Overview of some LLM benchmarks

LLM use cases at BBC

LLM finetuning and benchmarking – a few examples

Upcoming EBU Events

Multi-armed bandits for recommender optimization at NRK, VRT and ZDF

Data Technology Seminar 2025

Speech Intelligibility in Broadcasting