Bridging the operational gap in AI for efficient historical archive documentation

Alberto Messina and Maurizio Montagnuolo (Rai CRITS*)

Artificial intelligence (AI) technologies are becoming increasingly pervasive. The media industry is also undergoing profound change, progressively adopting AI in its content creation, archiving and annotation processes. However, despite these advances, several challenges remain. Many organizations rely on operational processes and technological stacks consolidated over years, which were not initially designed to support rapid evolution, modularity, or continuous updates. As a result, introducing new AI-driven components – especially those with significant functional requirements or demanding nonfunctional properties such as performance, scalability or reliability – can strain existing architectures.

In this context, introducing, modifying or replacing even a single component often impacts entire workflows, resulting in costly and time-consuming interventions. Therefore, the complexity of integrating AI solutions not only concerns what the system must do, but also how easily it can be adopted, and how well it must operate within existing environments. We name this phenomenon ‘operational gap’, i.e., the barrier between what AI technologies are natively able to do, and what the target business process is aimed at.

Identity recognition

An emblematic example is that of personality annotation in archival films. On the one hand, AI-based facial recognition has demonstrated superior efficiency and accuracy compared to humans. On the other hand, AI systems perform poorly on historical figures who are relatively obscure at the time annotation takes place and who are underrepresented in modern digital media, making them difficult to recognize. This implies that the deployment of a facial recognition model alone is insufficient for ensuring the operational effectiveness of the annotation task. It is additionally necessary to define a robust process for the creation, management and continuous update of a gallery of reference personalities.

The limitations observed in face recognition become even worse when annotating speaking individuals. In fact, speaker recognition requires vocal reference galleries that are far more difficult to build, since voiceprints are highly sensitive to contextual factors such as the characteristics of recording devices, environmental noise, and the presence of overlapping speech. Furthermore, while face recognition algorithms have achieved a high level of maturity, speaker recognition models are less effective and robust, especially when dealing with short audio samples.

Semantic enrichment

Visual scene description is another critical field. Although the development of multimodal large language models (LLMs) significantly advanced this topic, current technology is still not mature enough to produce reliable visual scene descriptions for several reasons. First, the narrative of a video scene is inherently difficult to capture, represent and generalize with a neural model in a consistent and content-agnostic manner. Second, entity-aware descriptions – i.e., descriptive text containing explicit mentions to identified entities such as landmarks and persons in the scene – remain challenging to create, since most of the systems lack the ability to generate textual outputs that explicitly and reliability identify those entities.

As a final example, let us consider the problem of credits identification and reconciliation. This is a very important operation, especially for old, digitized material for which little or no metadata are available. Reconstructing credits like interviewees or speakers is thus possible only by exploiting superimposed graphics or scene text. For this reason, we employed OCR (optical character recognition) as a basis to heuristically infer these pieces of information, enriching the content with valuable information about the roles and names of individuals taking part in the scene.

Overall, these examples illustrate the breadth of the operational gap, and motivate the efforts needed to develop solutions that are not only algorithmically accurate and innovative, but also practically applicable, sustainable, and economically feasible in real environments.

This article first appeared in the March 2026 issue of tech-i magazine.

* Rai Centre for Research, Technological Innovation and Experimentation

 

 

Latest news