
Daten zum Projekt

Data Extraction and Interactive Visualization of Unexplored Textual Datasets for Investigative Data-Driven Journalism

Zur Projekt-Website

Initiative: Wissenschaft und Datenjournalismus
Bewilligung: 21.10.2015
Laufzeit: 9 Monate


This project combines the latest scientific findings in natural language processing and information visualization in a powerful exploration tool that provides accelerated access to text document collections for investigative data-driven journalism. Whereas today's data-driven journalism is mostly based on structured data, i.e. data available in spreadsheets or databases, this project applies the principles of data journalism to unstructured text documents. Using statistical methods of language processing, such as named entity recognition and keyphrase extraction, important elements from these text documents can be identified and visualized in a network. A journalist can browse this network to quickly grasp the content of the text document collection, view the developments of groups of entities over time, annotate the elements of the network with labels and comments to attach a layer of interpretation to the data, and eventually publish part of the network along with an (online) article that is developed in a data-driven fashion from the findings in the document collection. The project consortium combines complementary expertise from "Der Spiegel" and the TU Darmstadt: investigative journalism, information visualization and text mining.


Open Access-Publikationen