AI Evaluation

Structured evaluation of AI technologies, with a focus on agentic AI in 2026.

Scope

The AI Evaluation group conducts structured evaluations of AI technologies for media applications, develops proof-of-concept projects for practical validation, and publishes reports covering technology scouting, assessment criteria, use cases, limitations, and operational and business impact. Our work is grounded in real media production contexts. We began with BenchmarkSTT, an open-source tool for evaluating speech-to-text systems, and developed one of the largest annotated video datasets for machine learning face recognition. In 2026, our dedicated focus is agentic AI and its implications for media workflows.

Objectives

We organise regular knowledge-sharing meetings throughout the year to keep Members current on AI developments. We publish structured reports on specific AI technology domains, run proof-of-concept projects that Members can learn from and adapt, and develop open-source evaluation tools that the wider community can use. Our 2026 focus on agentic AI includes a report on the state of the art and a proof of concept on agentic AI for ingest workflows.

Deliverables

  • Organise fortnightly knowledge-sharing meetings throughout 2026 with active Member participation
  • Publish a structured report on agentic AI for media applications
  • Develop a proof of concept on agentic AI for ingest workflows
  • Maintain and extend open-source AI evaluation tools including BenchmarkSTT