ABSTRACT

Important works of data journalism are disappearing from the web because they are too technologically complex to be captured or archived by libraries or web archiving technologies. Research based on journalism depends on the existence of news archives. For the benefit of future scholars, it is imperative that libraries and newsrooms solve this problem. This research contends that dynamic web archiving of data journalism will reguire a new, emulation-based approach to capturing these works. This new approach in turn necessitates new web archiving tools and workflows to enable collaborative collection of the projects, because unlike in print-based archiving, the process will depend on detailed technical information sharing among stakeholders.

Toward this end, this article summarizes the results of a questionnaire that described the most common frameworks, database technologies, and programming languages used to build 76 complex works of data journalism published between 2008 and 2017, as well the ways these works are being maintained and stored. This information can inform the development of emulation-based archiving tools to capture and preserve these stories using methods that would fit within the workflow of news organizations. This research is a first step toward devising an automated solution for long-term preservation of data journalism projects.