Swiss Echoes – Navigating a Geolocated Broadcasting Archive

As part of the Narratives project, the archive of Radio Télévision Suisse (RTS), with more than 200,000 hours of broadcast footage documenting Swiss life across decades, served as the foundation for the immersive installation Swiss Echoes. Much of this vast collection remains hidden in the spoken word, inaccessible through traditional cataloguing and metadata. To unlock this layer of meaning, we developed a computational pipeline that automatically transcribes and analyses the audio, fragments long broadcasts into smaller clips, and extracts the names of Swiss locations mentioned. These locations are then geolocated on a map of Switzerland and enriched with thematic categories, allowing visitors to explore the archive spatially. Within the Panorama+ immersive system, audiences can “fly” over this landscape of audiovisual memories, encountering the country’s places and stories through the voices of its broadcast history.

Visitors flying over the map of Switzerland in Swiss Echoes.

Unlocking the spoken content in the RTS archive

The process begins by converting speech into text using WhisperX, a state-of-the-art speech recognition model (Radford et al., 2023; Bain et al., 2023). Each video is segmented into speaker-specific clips, which become individual, searchable records. Within these transcripts, named locations are automatically identified using Named Entity Recognition (NER).

To situate these places on the map, each extracted location is cross-referenced with Wikidata, a large open knowledge base (Rudnik & van Veen, 2019). This step attaches precise geographic coordinates and multilingual labels. In practice, this means that when a clip mentions “Lausanne,” the system can link it not just to the word, but to the real city, with its latitude, longitude, and contextual information.

Because Wikidata contains thousands of different kinds of places—ranging from municipalities to fountains—we refined the dataset to a curated set of categories most meaningful for exploration, such as cities, villages, regions, and natural landmarks. This allows visitors to browse the archive spatially, moving seamlessly from Switzerland’s largest cities to its smallest villages.

Geolocation of clips extracted from 24,000 hours of footage from the RTS archive based on locations mentioned in the spoken content.

Geography alone is not enough to capture the richness of the archive. To introduce a semantic layer, we worked with large language models (LLMs) to classify each clip into broad thematic categories such as politics, sports, or culture. This enables new ways of navigating: not only through space, but also through theme.

Distribution of the categories assigned to a sample of 24,000 hours of footage from the RTS archive.

In total, this pipeline processed nearly 24,000 hours of footage, generating over 2,000 distinct Swiss locations enriched with coordinates and thematic categories. These computational processes provide the foundation for Swiss Echoes, an immersive installation where visitors can step into a spatial map of the RTS archive and explore its stories in new ways.

Description of the Installation

Technical Environment

Swiss Echoes is hosted in the Panorama+, a large-scale 360-degree stereoscopic projection environment. This immersive platform places embodiment at the core of the interaction, enabling visitors to physically situate themselves within a spatialised digital archive. By supporting both visitor-centred and object-centred perspectives, Panorama+ allows audiences to orient themselves in relation to the entities they encounter in the virtual world (Kenderdine, 2015).

The application was developed using Unreal Engine 5, with the nDisplay system distributing rendering across the Panorama+ computing cluster. Navigation is mediated through an HTC Vive controller, which provides intuitive gesture-based input. Visitors point in a chosen direction and press the trigger to glide forward, or aim at content elements to make selections.

Global Layout: Geolocated Content

The global structure of the installation is based on the computational pipeline described earlier. Each video clip is geolocated according to the place names referenced in its transcript and visualised on a 3D topographical model of Switzerland built using Swisstopo data.

At each location, corresponding clips are represented by floating cubes with slightly rounded corners that recall CRT televisions. These form rising spires that make areas rich in archival content visible from a distance. Floating labels display place names at ground level, further guiding navigation.

As visitors glide over this virtual map, they encounter spatialised audio excerpts from nearby clips. Thanks to the Panorama+ 32-speaker array, sound is positioned directionally, encouraging visitors to turn and re-orient themselves towards points of interest. Visual and aural cues thus work in tandem: topographical features and video spires guide the eye, while sound prompts serendipitous discovery through embodied listening.

Local Layout: Thematic Cylinders

When a visitor selects a location, the system transitions into a local exploration mode, presenting a cylindrical wall of CRT televisions filled with video thumbnails. Here, content is arranged by broad thematic categories generated through large language model (LLM) classification. Each ring of screens corresponds to one theme, and clips can appear in multiple rings if they belong to more than one category.

The cylindrical form supports embodied browsing: visitors physically turn to face different directions, and look upward or downward to move across thematic layers. In smaller locations, where fewer clips are available, this design asks visitors to actively seek out videos, reinforcing exploration through movement.

Interaction is simple and performative: pressing the controller’s main button activates playback. Up to ten clips can play simultaneously, though only the one under direct inspection emits sound, ensuring clarity and coherence. This design encourages an exploratory rhythm—scanning, listening, and curating one’s own audiovisual pathway through the archive.

Visitors browsing the videos associated to a given location in Swiss Echoes.

A Situated Experience

Through its two-level structure—global (map-based) and local (thematic cylinders)—Swiss Echoes invites visitors to build personal pathways across the RTS archive. Locations become geographic anchors and semantic containers, linking Switzerland’s places with the thematic layers of broadcast history. Rather than imposing a linear view of the archive, the installation offers an open-ended, embodied, and situated exploration of over two decades of audiovisual memory.

References

Bain, M., Huh, J., Han, T., & Zisserman, A. (2023). WhisperX: Time-Accurate Speech Transcription of Long-Form Audio. INTERSPEECH 2023

Kenderdine, S. (2015). Embodiment, Entanglement, and Immersion in Digital Cultural Heritage. In A New Companion to Digital Humanities (pp. 22–41). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118680605.ch2

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. International Conference on Machine Learning, 28492–28518.

Rudnik, C., Ehrhart, T., Ferret, O., Teyssou, D., Troncy, R., & Tannier, X. (2019). Searching news articles using an event knowledge graph leveraged by wikidata. Companion proceedings of the 2019 world wide web conference, 1232–1239