Giacomo Alliata, doctoral assistant in the Laboratory of Experimental Museology eM+, has participated in the DHCH@ISR 2022 event in Rome, Italy, in June 20221.
Giacomo has given a presentation on “Exploring Large Audiovisual Archives through Embodied Experiences in Immersive Environments”2. We report here the abstract of the presentation.
Audiovisual archives are the mnemonic archives of the 21st century, with widespread platforms of video sharing like Youtube or TikTok on one hand, and important cultural institutions increasingly digitizing their video collections, such as the BBC with its 1 million hours of footage or the Italian national television Rai with its multimedia archive RaiTeche, that has already digitized the entire production of its three main channels since 1999. However, these large archives remain mostly inaccessible, due to the sheer amount of content combined with the lack of a compelling system to explore them. Only 20% of the 200000 hours of the RTS audiovisual archive are accessible online for instance. Moreover, archival scholars have stressed the importance of innovative forms of engagement through compelling frameworks for the exploration of these large collections. This presentation will argue how the idea of embodiment in immersive environments can be used to propose a compelling experience, revealing how narrative can emerge in such frameworks through the concept of immersive generosity. The focus will be on employing data as material for the creation of a virtual world and the social interactions that result from multi-user environments placing users as actors of the storytelling rather than mere spectators, with clear benefits in terms of enjoyment of the experience and understanding of the cultural aspect.
Giacomo Alliata, doctoral assistant in the Laboratory of Experimental Museology eM+, has participated in the graduate student symposium of the Creativity and Cognition 2022 conference in Venice, Italy, in June 2022, with a paper on “Redefining Access to Large Audiovisual Archives through Embodied Experiences in Immersive Environments: Creativity & Cognition 2022 – Graduate Student Symposium”1. We report here the abstract.
Audiovisual archives are the mnemonic archives of the 21st century, with important cultural institutions increasingly digitizing their video collections. However, these remain mostly inaccessible, due to the sheer amount of content combined with the lack of innovative forms of engagement through compelling frameworks for their exploration. The present research, therefore, aims at redefining access to large video collections through embodied experiences in immersive environments. The author claims that, once users are empowered to be actors of the experience rather than mere spectators, their creativity is stimulated and narrative can emerge.
Giacomo Alliata and Yuchen Yang, doctoral assistants in the Laboratory of Experimental Museology eM+, have participated in the DARIAH-EU annual event 2022 in Athens, Greece, in May 2022, with a poster on “Exploring large audiovisual archives through storytelling in an immersive environment: a conceptual framework”1. We report here the abstract.
Audiovisual archives are the mnemonic records of the 20th and 21st centuries, the immense complexity of the past happenings, preserving individual and collective histories, memories, feelings, cultures and aesthetics. These collections have, in the past decades, dramatically increased in size, with, on one hand, an emergence of online video sharing services like Youtube and Vimeo, and important institutions digitising their archives, such as the BBC with its 1 million hours of footage.
However, these large archives remain mostly inaccessible, due to copyright issues and to the sheer amount of content combined with the lack of a compelling system to explore them. Only 20% of the 200000 hours of the RTS audiovisual archive are accessible online for instance. Moreover, archival scholars have stressed the importance of innovative forms of engagement through compelling frameworks for the exploration of these large collections.
Within this context, the Sinergia project Narratives From the Long Tail: Transforming Access to Audiovisual Archives aims to reexamine the relationship between archives, memory institutions, and general audiences through cutting edge computational and immersive technologies. We argue that, faced with the extensive amount of content available, meaningful storytelling frameworks are necessary for understanding and exploring an audiovisual collection. Thus, this paper will examine the formation of such a conceptual framework on the archival content and digital interface level.
There is an increasing trend for transforming archives to be big data organisations through digitisation and state-of-art computational methods. The transformation not only enhances the management and accessibility for archives, but also unlocks the semantics in multimodal archival content as well as the potential use of domain knowledge. Such an upgrade should in theory surfacing the hidden structure, revealing and building connections between contents, allowing fast and effective curation of the archive to serve a variety of purposes. In this part of the paper, we will map different approaches used for current practises on digitally transforming archives for various storytelling purposes, and aim at identifying and addressing the opportunities, issues and challenges laying ahead brought by the methodological shift.
Similarly, at the interface level, various approaches are taken to propose an immersive installation in which users can explore a large collection in a compelling way, driving their own storytelling experience. In this section, the ideas of embodiment will be leveraged to review meaningful digital installations, revealing how narrative can emerge in such frameworks. Multiple interactions and visualisations approaches can be employed to explore the semantics discovered through computational methods, using data as a sculpting material in the creation of a virtual world. Furthermore, in multi-users environments, social interactions place users as actors of the storytelling rather than mere spectators, with clear benefits in terms of enjoyment of the experience and understanding of the cultural aspect.
In conclusion, this paper will propose a conceptual framework to explore a large collection of audiovisual items through the idea of storytelling in an immersive installation.
Eye Filmmuseum’s contribution to Narratives from the Long Tail is a selection of high-resolution digitized films from its Mutoscope & Biograph Collection. As explained elsewhere on this website, these remarkable early silent films were made between 1897 and 1903 by the American Mutoscope & Biograph Company, which was based in the United States, and by its short-lived international branches, primarily those in England, The Netherlands, France, and Germany. Shot on large format 68mm stock, these films—which all run approximately thirty seconds to one minute in length—contain depictions of theatrical entertainment, fictionalized comedic and dramatic scenes, and non-fiction views of daily rural and urban life, notable people and events, and industry, transportation, and nature.
In this post, I briefly introduce a few of the digitized films from Eye that will be used in the Narratives project. I also use this space to highlight the historical importance of the Biograph films—in terms of both the material specificity of the 68mm format and the company’s early cinematic exhibition practices—and how that importance can be reactivated today through novel digital interfaces and embodied spectatorial engagement.
The Biograph Films
Approximately seventy of the two hundred films in Eye’s Mutoscope & Biograph Collection will be used in the Narratives project. Among these seventy titles are numerous films shot around Europe (and to a lesser extent in the United States). Many of the titles are from The Netherlands; there are also a handful of films made in Italy, England, and France as well as two that were shot in Switzerland. While it is important to remember that filmmaking categories such as “non-fiction,” “documentary,” and “narrative fiction,” for example, were not fully established at the time that these films were made, the majority of the Biograph films included in this project can broadly be described as non-fiction records of various people, places, and real-life activities (however staged their production might actually have been). For example, there are films depicting: Pope Leo XIII in the Vatican; a multicycle race in Boston, Massachusetts; French military exercises; a procession of Capuchin monks; scenic views of the Venice canals; and Hiram Maxim demonstrating his invention—the rapid-fire machine gun—for the camera.
What follows are four personal favorites among the seventy Biograph films included in the Narratives project:
Prinsengracht (Nederlandsche Biograaf- en Mutoscope Maatschappij, 1899)
Prinsengracht is one of many “phantom ride” films in the Mutoscope & Biograph Collection in which the camera is placed at the front or back of a moving vehicle, creating an immersive experience for the viewer whose point of view is aligned with that of the camera. Here, the camera is placed on the front of a boat moving up Amsterdam’s eponymous canal, giving the spectator a glimpse of the urban space, and its quotidian pedestrian and water activities, at the turn of the twentieth century.
Les Parisiennes (American Mutoscope Company, 1887)
One of two hand-colored films in Eye’s Mutoscope & Biograph Collection, Les Parisiennes is a registration of a theatrical can-can number, and embodies early cinema’s interest in movement, motion, and dance. Like many of the films in this collection, behind-the-scenes credits and other contextual information, like the name of the Biograph camera operator, for example, remain unknown to us today. 1
Hondenkarren [Dog Pulls Cart in Front of Mill] (Nederlandsche Biograaf- en Mutoscope Maatschappij, British Mutoscope and Biograph Syndicate [?], 1898 [?])
Until roughly the end of 1898, the Biograph camera weighed approximately 260-300 pounds.2 Such a heavy machine thus had to ideally be placed in the best possible “vantage point” 3 for the unfolding action. In Hondenkarrren, which depicts dog carts crossing a makeshift bridge in the Dutch countryside, the camera is placed at a distance, almost level with the scene, likely in order to effectively capture the horizontal action and the surrounding area.
The Price of a Kiss (American Mutoscope Company, 1899)
Filmed at Biograph’s open-air studio in New York—notice the breeze blowing in the confined set—The Price of a Kiss consists of a short comic film the likes of which were very popular among early film audiences. It therefore exemplifies the more explicitly staged, comedic narratives or scenes orchestrated and captured by the company and its actors.
“The Show of Shows in the Movie Field”
As the crisp digitized moving images above illustrate, 68mm, which is unperforated (i.e., no sprocket holes) and has an image area of more than seven times that of the standard 35mm full-frame image, 4 offers incredibly clear and sharp visual detail. This large format film was also exposed at the rate of 30-40 frames per second, approximately double the speed used by other producers at the time, resulting in steadier images with reduced flicker. 5 As a result of its superior visual product, which was reportedly particularly spectacular on a large screen, the Biograph company quickly occupied, in the words of film historian Paul C. Spehr, “the front rank, becoming the show of shows in the movie field.” 6 By 1901, however, the large format film, which was more expensive for exhibitors, was no longer viewed as a unique visual draw and 35mm was rapidly becoming the industry standard. In the following years, the different Biograph branches outside of the United States either shut down or tried to adapt to the realities of the maturing industry, and the use of 68mm was discontinued beginning in 1903. 7
Upon entering the film archive in 1959, these 68mm Biograph films were thus rare technological artifacts, and constituted what archivist Mark van den Tempel called “a preservation challenge of the first rank,” 8 due to their large size and lack of sprocket holes and the loss of original specialized projection equipment and technical knowledge. During the first major restoration of these archival materials—an analog project carried out by Eye in the 1990s 9—the 68mm films were copied to 35mm, which reduced the size of the image and resulted in a loss of visual sharpness while simultaneously allowing these films to finally be projectable on a standard format. 10
Thanks to advances in film restoration practices in the digital era, however, Eye was able to return to the original 68mm materials and, in 2019-2020, the archive scanned approximately seventy titles from the collection at 8K resolution. 11 This digital restoration project produced high definition versions of the selected films, reconstituting for contemporary viewers the original large image size, visual sharpness, and clarity that defined the Biograph films approximately 120 years ago.
As such, these seventy digitized non-standard films from the Mutoscope & Biograph Collection are ideal for the purposes of the collaborative Narratives project, which aims, using large-scale immersive digital interfaces, to make the “long tail” of the audiovisual archive more accessible to various publics. Once the “show of shows” in the domain of early theatrical exhibition, the selected Biograph films can be reactivated and revitalized thanks to digital technologies and large-scale novel interfaces that will allow contemporary spectators to experience and engage with these dazzling high-resolution large-format images in immersive ways.
“A Rather Contingent Affair”
Not only are the high resolution digitized 68mm films ideal material for computational experimentation and various visualization frameworks. The Biograph company’s original exhibition practices for its 68mm films also reflect the exciting potential of diverse user-driven participatory engagement.
In addition to providing each exhibition venue with a special 68mm projection apparatus and projectionist, the Biograph company delivered a set curated program of films to screen. 12 Not only did these curated programs often generate various narrative, thematic, or symbolic links between the selected films, but the company also often re-exhibited titles in brand new curated programs and alongside different titles over time, based on the topicality of their content. 13 For example, film archivist Nico de Klerk has described how a film depicting U.S. President William McKinley at home was first shown in October 1896, when he was still a presidential candidate; it was retitled and brought back in Spring 1898 after he had been elected “on the occasion of, and subsequently amid [films] relating to, the Spanish-American War, a reminder perhaps of McKinley’s campaign promise to liberate Cuba from Spanish misgovernment.” 14 Thus, as a result of Biograph’s theatrical programming practices, the original exhibition of these 68mm films be understood as “a rather contingent affair.” 15
The Biograph company’s practice of recombining films, or “reframing” 16 as de Klerk called it, activated a flexibility, fluidity, and multivocality that is also key to contemporary user-driven and experiential digital museological practices. “[W]ith each change of position,” de Klerk has argued, a given Biograph film may have “functioned slightly differently” 17 as the unique “juxtaposition of films create[d] its own dynamics and affect[ed] interpretation and appreciation.” 18 This practice of reframing can also be revitalized by contemporary users, who could explore different ways to remix the films (in whole or in parts) and even experiment with creating musical accompaniment for them, which, much like historical silent film accompaniment, was also a contingent affair that affected interpretation and appreciation. Ultimately, through the Narratives project, these digitized films will be incorporated into scalable public-facing interfaces that will not only allow diverse users to explore and engage with silent films that historically have been juxtaposed and recombined; it will also make space for as-yet unimagined affective, aesthetic, and narrative connections and reframings.
Brown, Richard. “The Biograph Group as a Multinational Company.” Griffithiana 66-70 (1999/2000): 66-77.
de Klerk, Nico. “‘Pictures to be Shewn’: Programming the American Biograph.” In VisualDelights: Essays on the Popular and Projected Image in the 19th Century. Eds. Simon Popple and Vanessa Toulmin. England: Flicks Books, 2000. 204- 223.
— — —. “Programme of Programmes: The Palace Theatre of Varieties.” Griffithiana 66-70 (1999/2000): 241-7.
— — —. Showing and Telling: Film Heritage Institutes and Their Performance of Public Accountability. Wilmington, DE: Vernon Press, 2017.
Rossell, Deac. “The Biograph Large Format Technology.” Griffithiana 66-70 (1999/2000): 78- 115.
Saccone, Kate. “The Public (Re)Making of an Archival Film Collection: Eye Filmmuseum’s Mutoscope & Biograph Collection and Contemporary Silent Film Curating.” M.A.Thesis, University of Amsterdam, 2021.
Spehr, Paul C. “Filmmaking at the American Mutoscope and Biograph Company 1900-1906.” The Quarterly Journal of the Library of Congress vol. 37, no. 3/4 (Summer/Fall 1980): 413-421.
— — —. “Politics, Steam and Scopes: Marketing the Biograph.” In Networks of Entertainment:Early Film Distribution, 1895-1915. Ed. Frank Kessler. Bloomington, IN: Indiana University Press, 2007. 147-156.
— — —. “Throwing Pictures on a Screen: the Work of W.K.L. Dickson, FilmMaker.” Griffithiana 66-70 (1999/2000): 11-65.
van den Tempel, Mark. “Making Them Move Again: Preserving Mutoscope and Biograph.” Griffithiana 66-70 (1999/2000): 224-230.
Birhane et al. (2022) shared their review on the heated topic of AI ethics in the latest paper, and answered the question “How has recent AI Ethics literature addressed topics such as fairness and justice in the context of continued social and structural power asymmetries?”
How has recent AI Ethics literature addressed topics such as fairness and justice in the context of continued social and structural power asymmetries?
We trace both the historical roots and current landmark work that have been shaping the field and categorize these works under three broad umbrellas: (i) those grounded in Western canonical philosophy, (ii) mathematical and statistical methods, and (iii) those emerging from critical data/algorithm/information studies. We also survey the field and explore emerging trends by examining the rapidly growing body of literature that falls under the broad umbrella of AI Ethics. To that end, we read and annotated peer-reviewed papers published over the past four years in two premier conferences: FAccT and AIES.
We organize the literature based on an annotation scheme we developed according to three main dimensions: whether the paper deals with concrete applications, use-cases, and/or people’s lived experiences; to what extent it addresses harmed, threatened, or otherwise marginalized groups; and if so, whether it explicitly names such groups. We note that although the goals of the majority of FAccT and AIES papers were often commendable, their consideration of the negative impacts of AI on traditionally marginalized groups remained shallow.
Taken together, our conceptual analysis and the data from annotated papers indicate that the field would benefit from an increased focus on ethical analysis grounded in concrete use-cases, people’s experiences, and applications as well as from approaches that are sensitive to structural and historical power asymmetries.
Birhane, A., Ruane, E., Laurent, T., Brown, M. S., Flowers, J., Ventresque, A., & Dancy, C. L. (2022). The Forgotten Margins of AI Ethics. arXiv preprint arXiv:2205.04221.
The most recent work from Kaur et al. (2022) explores the interpretability and explainability of machine learning models through sensemaking theory.
Understanding how ML models work is a prerequisite for responsibly designing, deploying, and using ML-based systems. With interpretability approaches, ML can now offer explanations for its outputs to aid human understanding. Though these approaches rely on guidelines for how humans explain things to each other, they ultimately solve for improving the artefact — an explanation. In this paper, we propose an alternate framework for interpretability grounded in Weick’s sensemaking theory, which focuses on who the explanation is intended for. Recent work has advocated for the importance of understanding stakeholders’ needs—we build on this by providing concrete properties (e.g., identity, social context, environmental cues, etc.) that shape human understanding. We use an application of sensemaking in organizations as a template for discussing design guidelines for sensible AI, AI that factors in the nuances of human cognition when trying to explain itself.
Kaur, H., Nori, H., Jenkins, S., Caruana, R., Wallach, H., & Wortman Vaughan, J. (2020, April). Interpreting interpretability: understanding data scientists’ use of interpretability tools for machine learning. In Proceedings of the 2020 CHI conference on human factors in computing systems (pp. 1-14).
Digitisation, Digital transformation, Datafication
With the rise of computational methods for humanities research, as well as the increasing born-digital materials out there, archives around the globe are experiencing a digital turn. Compared to practices years ago where the focus is predominantly on digitising physical objects, nowadays practices cover all aspects of record-keeping – creation, capturing, organising and pluralising – through a combination of digitisation, digital transformation, and datafication (Gilliland, 2014).
A clear switch in mindset can be seen in the digital turn where archivists start to seek more holistic and future-oriented solutions for their existing and potentially new materials, not only trying to collect in the fullest manner possible, preserving natively to the complexity of their form but also bridging the gap between archive and end-users, critically attuned to the possibilities and perils that come with the use (Padilla, 2017).
More specifically, the digital turn happens on three major levels:
Infrastructure – Changes that happened on this level mostly focus on issues relating to accessibility and management of archives, such as digital access, appraisal, handling of sensitive information;
Coverage – This part includes attempts to foster inter-collection collaboration and compatibility, as well as methods and standards to archive new materials such as social media contents, aiming to create less biassed and more representative collections;
Content – This is where digitisation and most of the content understanding and description happen. The goal is to transform the material into the digital space for better accessibility and to generate structured descriptors that could eventually describe as much as possible the original from a range of perspectives possible.
The purpose of the digital turn is to ultimately enable a future-facing archive with greater accessibility, better coverage, and finer granularity.
With the primary interest of research using digital methods and archives remains textual (Fickers, 2018), numerous works of such can be seen in textual based collections such as manuscripts, legal documents, and newspapers. However, few have been done on audiovisual archives. To further examine the digital turn for audiovisual archives, we will be looking closely into three topics: the existing standards for audiovisual contents; a conceptual model for Understanding AV contents from the top-down; and practices with AV archives in the digital age.
Standards for AV contents description
Compared to structured or even unstructured textual contents, audiovisual ones are not particular-to to the default computational infrastructures to interpret and access. The semantics are well hidden under multimodality, creating barriers to managing and using such archives (Tesic, 2005). Although the recent advances in natural language processing and computer vision have enabled many content level extraction and analysis tools, offering the content and semantic level descriptors to the table, only a few are developed for or applied to audiovisual archives. Attempts of using such methods on audiovisual archives yield very preliminary results too – providing nothing much beyond what is available through standard full-text search (Brock, 2022). However, the importance of descriptors for audiovisual contents, especially at scale, remains.
Describing archival contents using different descriptors is not news. Various metadata or descriptor standards were created without considering the development of computational tools, and covering not only content level information. Whether the annotation is manually or automatically created, the goal of such standards is to regulate and standardise what to describe when it comes to an archive system. These descriptors, in essence, are not to replace the original, but to provide a vehicle for more digital and data-centric management and application.
Similarly to every other domain, it is not practical to have a single standard to address everything in the audiovisual world. Existing metadata standards were created with different focuses on their use and coverage (Smith, 2006).
As can be seen in Fig.1, MPEG-7 has a focus on content level description and addresses all media types from text to video. DMS-1, on the other hand, covers some content level stuff but is combined with production level metadata. It is very industry-focused, specifically designed for the TV & broadcast companies to use as an interchange layer between archives. P_Meta, similar to DMS-1 is designed specifically for the TV & broadcast industry, but unlike DMS-1, it does not contain the content level descriptors but mainly describes the overall workflow of the industry, covering not only the production but also content creation and lifecycle management.
In recent practices, more attempts for flexible and situated standards are built upon the foundation of these broad and large standards, mainly to cope with the ever-growing specific use cases that emerged from the day-to-day. For example, markup languages NewsML and SportsML are developed to enhance the exchangeability for broadcasting news content and sports features. Looking into the culture and heritage sector, although not focusing on audiovisual contents, there are attempts on creating case-specific standards, some through mapping MPEG-7 to CIDOC CRM (Angelopoulou, 2011), some merging Dublin Core to create a connection to culture and heritage specific contents (Kosch, 2002; Thornely, 2000).
Although these attempts show the potential of creating a situated standard specifically for audiovisual archives in the cultural and heritage sector, describing the content for the unique needs of memory institutions dealing with moving images, they mostly come from a technical or problem-solving perspective. The quest of constructing a standard, or in other words, a data model for audiovisual contents in the cultural and heritage sector is never examined holistically.
A conceptual model for understanding audiovisual contents from top-down
To decide what we can describe, we first need to understand what is an audiovisual archive and its contents in the cultural and heritage setting.
The content in question, from analogue films to clips on social media, from newsreels to feature films, is essentially anything that has the appearance of movement. And similar to any other material, such as a painting or a book, it serves its purpose to record and mirror the complexity of humanities’ past and present.
Previous studies look at contents in audiovisual archives from different perspectives, some treat contents as archival objects and talk about preservation and distribution, some explore rich semantic meanings and use them as references to study the past (Steinmetz, 2013; Wiltshire, 2017; Knight, 2012; Tadic, 2016). Roughly speaking, we could look at audiovisual contents from three broad dimensions – Archival; Affective & Aesthetic; Social & Historical.
In the archival dimension, every audiovisual content is regarded as an archival object, and descriptors corresponding to that would very much be focusing on the admin level, benefiting the record-keeping process including organising, management, and dissemination; Moving into the affective and aesthetic dimension, contents are regarded as some sort of artistic or cultural practice, and the focus is shifted to the audiovisual language, and its cultural or artistic indication; When the social and historical dimension is what in question, contents become narratives and pieces of evidence of past happenings and semantics (literal or not) out of these contents emerged to be the core.
It is not hard to further expand the model within each dimension, detailing different categories of descriptors, examples of descriptors from existing standard or innovative ones, as well as the corresponding methods for getting them (Fig. 2).
The purpose of summarising such a thinking model is to help draw the connection between lower-level descriptors (annotations, metadata, or whatever people call them), potential methods for acquiring such descriptors, and higher-level purposes built upon a single or a combination of audiovisual content dimensions. Ideally, the content descriptors should work as a middle ground, opening up the gate and guiding people with expertise in computational methods to link their works to the purposeful and meaningful interpretation of the contents, as well as people with curatorial objectives to break down their thoughts into different dimensions to engage more comfortably with different technologies out there possible to serve their goals.
Practices with AV archives in the digital age
Moving on to the practical level, there are some (although not as much as textual or image-based) projects working closely to the idea of the digital turn with a focus on audiovisual materials. The model constructed above is used as a guide to facilitate the process of pinpointing these attempts within the overall theoretical framework.
One major theme for such projects, broadly speaking, is to enhance accessibility and searchability
Some are doing so by working on the semantics, building pipelines for extracting semantic entities from the multimodality of the content or linking existing ones to a connected graph.
Compared to the traditional audiovisual archive experiences powered by basic metadata (production, format, keyword, etc.), these attempts improve the searchability of the archive by adding the textual search terms from the content level. These practices, adding descriptors from multiple dimensions, also enable the exploration in alternative organisations (graph-based for example) of an audiovisual archive for better connectivity and exploration of the content.
Several projects take the path of platform building. Indiana University (IU) Libraries, in collaboration with New York Public Library and digital consultant AVP, is developing an open-source software system AMP, that enables the efficient generation of metadata to support discovery. Using tools like OCR, and applause/instrument detection to add semantic descriptors to their data. Similarly, the CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) is a distributed research infrastructure for the humanities and social sciences in the Netherlands and provides annotation and automatic tools for enhancing semantics. The Media Ecology Project at Dartmouth also aims to provide a new infrastructure for access to archival data, as well as manual and automated methods for enhancing the annotation of audiovisual contents.
Some projects focus on building tools and methods. ELAN (Fig.3) from the language archive in the Netherlands is a manual annotation tool for audio and video recordings, which allows unlimited semantic annotations in the form of a sentence, word or gloss, a comment, translation or a description of any feature observed in the media. Automated methods are constantly put forward and groups like the Distant Viewing Lab (Fig.4) are busing aggregating them to the field of cultural industries, lowering the bar for recognition and segmentation tools, as well as finding potential use cases with experiments. With the recent development in deep learning, researchers have moved into other semantics to extract. For example the research from Stanford University and the University of California, Berkeley focused on pose extraction and its potential for comparative choreography analysis (Broadwell, 2021). Another group of researchers from Leiden University is looking at the possibility of automated sign language annotation (Fragkiadakis, 2021)
Other projects are busing utilizing semantic information to create graph-based interfaces for access. Jazz Luminaries (Fig.5 & video) benefit from the rich semantic connections between artists and scores from the rich archive of Montreux Jazz Festival and build an interactive experience based on that. The project invites audiences to explore and remix the archive using the network that is built to reorganise the archive. More recently, Videograph (Rossetto, 2021), a video retrieval system proposed by a team of researchers from the University of Zurich, will also explore the potential of accessing audiovisual materials through a semantic metadata powered and graph-based interface.
Some have taken a different direction, working on introducing often ignored perspectives to add to the descriptions.
Instead of focusing on the semantic entities in the text, audio, and moving images, these projects choose to work with the audiovisual language. Looking at the different filming and editing techniques and their presence in the audiovisual contents, the role of colour and audio features. Working on the intersection between audiovisual languages and their aesthetic and affective meanings, the aim of such projects is often to enhance the audiovisual archive experience from a more abstract and artistic perspective.
Some efforts were made on building controlled vocabularies for audiovisual languages for future annotation. The Columbia File Language Glossary is one of the early works under this category. As an educational tool for the study of film, it features professionally curated key terms, corresponding clips, as well as visual and audio commentaries. The Kinolab, as well as Critical Commons, are also similar attempts for annotating audiovisual materials using controlled or lose vocabularies in terms of its audiovisual languages, as well as techniques used.
Aside from the projects working on a dictionary of literal terms to use for the artistic language used, some projects focus on a specific aspect of the audiovisual language. The MovieBarcodes (Fig.6 Left) project suggests colour as an additional quantitative parameter for movie analysis and describes an information system that allows scholars to search for movies via their specific colour distribution. Also on the colour front, VIAN (Fig.6 Right) supported by the ERC Advanced Grant FilmColors research project, is a professional colour analysis tool that enables the streamlined processing for colourimetry on multiple levels. enables insights into the ‘stylistic, expressive and narrative dimensions of film colours’. Polifonia, a 3M€ project funded by the EU Horizon 2020 Programme focuses to recreate the connections between music, people, places and events from the sixteenth century to the modern-day, works on methods to extract information on music patterns and music objects intrinsic features. The Sensory Moving Image Archive (Fig.7) project works on multiple fronts, using colour, shape, texture and movement to connect all clips and create an alternative experience for accessing and exploring the audiovisual archive without searching for literal items, encouraging serendipity discoveries.
The other major theme is on exploring the use and application of an archive and its multidimensional descriptors
Different from the previous theme where the focus is on creating a more accessible and explorable archive, practices under this theme mostly aim at utilising the archive to create sense-making experiences to express and explain.
Some of these projects have very specific topics. BBC Northern Ireland’s Rewind Archive consists of 13,000 clips including news reports, documentaries and lifestyle programmes from the 1950s to the 1970s. To provide a glimpse of what life was like in the old times and an opportunity to relive the timeline with regard to the complexity of history, this archive is carefully curated annotated with periods, locations, and themes. In RED – a spatial film installation (Fig.8) hosted in the Deutsches Filmmuseum, the use of the colour red is examined through a curated collection of film clips covering topics such as protagonists, objects, spaces, effects, and direction. The exhibition, using a single element in the audiovisual languages, explained the role of the aesthetic and narrative elements in films.
On the other hand, several projects are on the quest of exploring complex and sometimes unexplainable topics through interactive experiences or some sort of synesthetic translation. Artist practices with audiovisual archives such as the Suprise Machine (Fig.9 Upper left), The third AI (Fig.9 Upper right), and the Machine Hallucination (Fig.9 Lower) apply deep learning models on contents and reduce them to latent representations. Some use Umap or equivalent to display contents clustered, so that audience and see the changes happening moving around. Others use generative models to create content with similar characteristics from the latent space. And the unexplainable – style, machine understanding – are revealed through the comparison of the originals to the new contents. On the research end, many studies also work on translating the subtle feelings and complex concepts to something easier to perceive – understanding gentrification through the change of urban sound samples (Martin, 2021), as well as translating the ambience of something visual into something we can hear through image sonification (Graham, 2020; Kramer, 2021).
Co-curating and co-creating are heated topics for current curatorial practices. The annotated archive is the foundation for interactive, free, and inviting experiments. These practices benefit from the higher granular and connected contents and embrace diverse and personal narratives in the archive. One classic work on this topic is the T_VISIONARIUM II (Fig.10), which uses computationally segmented scene clips as well as annotations of emotion, expression, physicality and scene structure, speed, gender, colour. This processed archive and the interactive experience allow the viewer to both search and recombine 500 simultaneously looping video streams, creating an emergent narrative of their own. The BBC story former lives in a browser. However, the same principle applies – benefiting from annotated segments of videos, as well as an interactive interface, the web application allows users to create flexible, multithreading, and responsive stories.
Different dimensions of descriptors are also used in more commercial settings, sometimes as a replacement for the original content, enabling faster and easier studies on the relationship between the content and the audience, providing insights for commercial problems like video recommendation and audience engagement. Storyline (Fig.11), a project between McKinsey & Company’s Consumer Tech and Media team and the MIT Media Lab explores the possibility of this project reducing a video story to a visual valence curve using a combination of audiovisual features. The curve is used for grouping content and studying the relationship between audience engagement and emotional responses. Similarly, researchers are also working on using visual features based on MPEG-7 standard to recreate the Mise-En-Scène for movies and trying to crack the recommendation and personalisation problem using such a set of features (Deldjoo, 2017).
It is also worth noticing that there are efforts on the validation and infrastructure fronts to facilitate audiovisual archives’ digital turn.
With the increasing application of computational methods for annotation of the audiovisual content in the archival settings, the validation and benchmarking of existing models and methods for related tasks emerged. MediaEval is a benchmarking initiated by media scholars, offering challenges in artificial intelligence on multimodal data and aiming to develop and evaluate methods dedicated to the retrieval, access, exploration and analysis. BenchmarkSTT, a tool offered by the European Broadcasting Union, aims to benchmark the performances of methods for speech-to-text solutions. Industry leaders like Nextflix work hard on providing a more efficient and flexible infrastructure for containing annotations in the digital era. The Netflix Media Database is a database design and implementation to “persist deeply technical and rich metadata about various media assets at Netflix and to serve queries in near real-time using a combination of lookups as well as runtime computation.”
From the perspective of Narrative from the long tail
Traditionally, scholars regard narratives as spatiotemporal sequences of events. However, it seems too native as it prioritised literal features over the ‘poetic’ ones, and eliminated almost all ambiguities (Simecek, 2015). The definition from David Bordwell seems more appropriate for the project: Narrative can be broadly considered a trans medium or preverbal phenomenon, something more basic and conceptual – a vehicle for making sense of anything we encounter (Bordwell, 2012). With that, creating a narrative is dissected into two elements: things we encounter, and sense-making processes.
The digital turn of audiovisual archives has provided everything we need for the “things we encounter” – with more than ever freedom of manipulation, depth of understanding, and degree of granularity thanks to numerous and diverse existing and potential descriptors. All these sliced, connected, and well-described contents formed the tiny universe to encounter.
What’s left is the sense-making processes, which, in the scope of the project, are the various museum experiences – constructed by experimental interfaces, as well as immersive interactions. With delicate control of the sense-making processes, various narratives can be constructed – traditional storytelling is formed by a linear experience with no interaction or alternative interfaces for a small curated collection of contents, whereas an emergent narrative to explore the plurality of the materials can be formed by simply providing interfaces and interactions with utmost freedom to explore the granular and connected contents. However, the analysis of interactions, interfaces, and experiences, although being another main focus for the Narrative from the long tail, is not stressed in this post.
Padilla, T., 2017. On a collections as data imperative.
Fickers, A., Snickars, P. and Williams, M.J., 2018. Editorial Special Issue Audiovisual Data in Digital Humanities. VIEW Journal of European Television History and Culture, 7(14), pp.1–4.
Brock, D. 2022. A MUSEUM’S EXPERIENCE WITH AI. Available at: https://computerhistory.org/blog/a-museums-experience-with-ai/ (Accessed: 20 Jan 2022).
Broadwell, P. and Tangherlini, T.R., 2021. Comparative K-Pop Choreography Analysis through Deep-Learning Pose Estimation across a Large Video Corpus. DHQ: Digital Humanities Quarterly, 15(1).
Fragkiadakis, M., Nyst, V.A.S. and van der Putten, P.W.H., 2021. Towards a user-friendly tool for automated sign annotation: identification and annotation of time slots, number of hands, and handshape. Digital Humanities Quarterly (DHQ), 15(1).
Rossetto, L., Baumgartner, M., Ashena, N., Ruosch, F., Pernisch, R., Heitz, L. and Bernstein, A., 2021, June. VideoGraph–Towards Using Knowledge Graphs for Interactive Video Retrieval. In International Conference on Multimedia Modeling (pp. 417-422). Springer, Cham.
Martin, A., 2021. Hearing Change in the Chocolate City: Computational Methods for Listening to Gentrification. DHQ: Digital Humanities Quarterly, 15(1).
Kramer, M., 2021. What Does A Photograph Sound Like? Digital Image Sonification As Synesthetic AudioVisual Digital Humanities. DHQ: Digital Humanities Quarterly, 15(1).
Graham, S. and Simons, J., 2020. Listening to Dura Europos: An Experiment in Archaeological Image Sonification. Internet Archaeology, (56).
Deldjoo, Y., Quadrana, M., Elahi, M. and Cremonesi, P., 2017. Using mise-en-sc\ene visual features based on mpeg-7 and deep learning for movie recommendation. arXiv preprint arXiv:1704.06109.
Colavizza, G., Blanke, T., Jeurgens, C. and Noordegraaf, J., 2021. Archives and AI: an overview of current debates and future perspectives. ACM Journal on Computing and Cultural Heritage (JOCCH), 15(1), pp.1-15.
Gilliland, A.J., 2014. Conceptualizing 21st-century archives. Society of American Archivists.
Angelopoulou, A., Tsinaraki, C. and Christodoulakis, S., 2011, September. Mapping MPEG-7 to CIDOC/CRM. In International Conference on Theory and Practice of Digital Libraries (pp. 40-51). Springer, Berlin, Heidelberg.
Deldjoo, Y., Quadrana, M., Elahi, M. and Cremonesi, P., 2017. Using mise-en-sc\ene visual features based on mpeg-7 and deep learning for movie recommendation. arXiv preprint arXiv:1704.06109.