Clustering technical documents by stylistic features for authorship analysis

Berry, Daniel; Sazonov, Edward

Published in

SoutheastCon 2015

DOI: 10.1109/secon.2015.7132936

Tools

Export citation

Search in Google Scholar

Clustering technical documents by stylistic features for authorship analysis

Journal article published in 2015 by Daniel Berry, Edward Sazonov

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

While previous research has demonstrated the ability to discriminate between authors using purely stylistic features, the majority of studies have been conducted on large corpora of non-technical literature. We investigate the ability of unsupervised methods to recover the authorial structure of a collection of technical documents labeled by primary author. Experiments were conducted using 23 submitted conference and journal papers containing almost 100,000 words from a local engineering research group with papers authored by both the Principal Investigator and by graduate students. Stylistic information was extracted from the body of each text forming a feature vector representing the document. Spectral clustering was applied to the feature vectors and the resulting clustering had an Adjusted Rand Index of.306 which is significantly better than chance (p <.05).

Published in

Links

Tools

Clustering technical documents by stylistic features for authorship analysis

Abstract