Big ? Smart? Clean? Messy? Data in the Humanities

Journal article by Christof Schöch

This paper is about data in the humanities. Most of my colleagues in literary and cultural studies would not necessarily speak of their objects of study as “data.” If you ask them what it is they are studying, they would rather speak of books, paintings and movies; of drama and crime fiction, of still lives and action painting; of German expressionist movies and romantic comedy. They would mention Denis Diderot or Toni Morrison, Chardin or Jackson Pollock, Fritz Lang or Diane Keaton. Maybe they would talk about what they are studying as texts, images, and sounds. But rarely would they consider their objects of study to be “data.” However, in the humanities just as in other areas of research, we are increasingly dealing with “data.” With digitization efforts in the private and public sectors going on around the world, more and more data relevant to our fields of study exists, and, if the data has been licensed appropriately, it is available for research. The digital humanities aim to raise to the challenge and realize the potential of this data for humanistic inquiry. As Christine Borgman has shown in her book on Scholarship in the Digital Age , this is as much a theoretical, methodological and social issue as it is a technical issue. Indeed, the existence of all this data raises a host of questions, some of which I would like to address here. For example: What is the relation between the data we have and our objects of study? – Does data replace books, paintings and movies? In what way can data be said to be representations of them? What difference does it make to analyze the digital representation or version of a novel or a painting instead of the printed book, the manuscript, or the original painting? What types of data are there in the humanities, and what difference does it make? – I will argue that one can distinguish two types of data, “big” data and “smart” data. What, then, does it mean to deal with big data, or smart data, in the humanities? What new ways of dealing with data do we need to adopt in the humanities? – How is big data and smart data being dealt with in the process of scholarly knowledge generation, that is when data is being created, enriched, analyzed and interpreted?