Published in

SoutheastCon 2015

DOI: 10.1109/secon.2015.7132936

Links

Tools

Export citation

Search in Google Scholar

Clustering technical documents by stylistic features for authorship analysis

Journal article published in 2015 by Daniel Berry, Edward Sazonov ORCID
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

While previous research has demonstrated the ability to discriminate between authors using purely stylistic features, the majority of studies have been conducted on large corpora of non-technical literature. We investigate the ability of unsupervised methods to recover the authorial structure of a collection of technical documents labeled by primary author. Experiments were conducted using 23 submitted conference and journal papers containing almost 100,000 words from a local engineering research group with papers authored by both the Principal Investigator and by graduate students. Stylistic information was extracted from the body of each text forming a feature vector representing the document. Spectral clustering was applied to the feature vectors and the resulting clustering had an Adjusted Rand Index of.306 which is significantly better than chance (p <.05).