Full text: Download
While previous research has demonstrated the ability to discriminate between authors using purely stylistic features, the majority of studies have been conducted on large corpora of non-technical literature. We investigate the ability of unsupervised methods to recover the authorial structure of a collection of technical documents labeled by primary author. Experiments were conducted using 23 submitted conference and journal papers containing almost 100,000 words from a local engineering research group with papers authored by both the Principal Investigator and by graduate students. Stylistic information was extracted from the body of each text forming a feature vector representing the document. Spectral clustering was applied to the feature vectors and the resulting clustering had an Adjusted Rand Index of.306 which is significantly better than chance (p <.05).