Enabling complex analysis of large-scale digital collections: humanities research, high performance computing, and transforming access to British Library digital collections

Terras, Melissa; Baker, James; Hetherington, James; Zaltz Austwick, Martin; Beavan, David; Welsh, Anne; O'Neill, Helen; Finley, Will; Duke-Williams, Oliver; Farquhar, Adam

Published in

Oxford University Press, Digital Scholarship in the Humanities, 2(33), p. 456-466, 2017

DOI: 10.1093/llc/fqx020

Tools

Export citation

Search in Google Scholar

Enabling complex analysis of large-scale digital collections: humanities research, high performance computing, and transforming access to British Library digital collections

Journal article published in 2017 by Melissa Terras, James Baker, James Hetherington, Martin Zaltz Austwick, David Beavan, Anne Welsh, Helen O'Neill, Will Finley, Oliver Duke-Williams, Adam Farquhar

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Although there has been a drive in the cultural heritage sector to providing large-scale, open data sets for researchers, we have not seen a commensurate rise in humanities researchers undertaking complex analysis of these datasets for their own research purposes. This paper reports on pilot project at University College London (UCL), working in collaboration with the British Library, to scope out how best High Performance Computing facilities can be used to facilitate the needs of researchers in the humanities. Using institutional data-processing frameworks routinely used to support scientific research, we assisted four humanities researchers in analysing 60,000 digitised books, and we present two resulting case studies here. This research allowed us to identify infrastructural and procedural barriers and make recommendations on resource allocation in order to best support non-computational researchers in undertaking “big data” research. We recommend that research software engineer capacity can be best deployed in maintaining and supporting datasets, while librarians can provide an essential service in running initial, routine queries for humanities scholars. At present there are too many technical hurdles for most individuals in the humanities to consider analysing at scale these increasingly available open data sets, and by building on existing frameworks of support from research computing and library services, we can best support humanities scholars in developing methods and approaches to take advantage of these research opportunities.

Published in

Links

Tools

Enabling complex analysis of large-scale digital collections: humanities research, high performance computing, and transforming access to British Library digital collections

Abstract