Links

Tools

Export citation

Search in Google Scholar

Enrichment by Elimination, or: How to turn HTML into simple TEI using Python

Published in 2014 by Christof Schöch
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Question mark in circle
Preprint: policy unknown
Question mark in circle
Postprint: policy unknown
Question mark in circle
Published version: policy unknown

Abstract

There are lots of full text repositories of literary works out there, be it the venerable Project Gutenberg (founded in 1971, when the internet was just a few dozen computers), a pioneer like Gallica (with increasing amounts of plain text in the 90-95% correct OCR range), or a crowdsourced efforts like Wikisource (with nifty quality indicators). Closer to my geographical location are initiatives like TextGrid's Digitale Bibliothek and the Deutsches Textarchiv (both very professional and acade.