Oxford University Press (OUP), Bioinformatics, 9(17), p. 840-842
DOI: 10.1093/bioinformatics/17.9.840
Full text: Download
SAGE data are obtained by sequencing short DNA tags. Due to the mistakes in DNA sequencing, SAGE data contain errors. We propose a new approach to identify tags whose abundance is biased by sequencing errors. This approach is based on a concept of neighbourhood: abundant tags can contaminate tags whose sequence is very close. The application of our approach reveals that moderately abundant tags can be generated by sequencing errors uniquely. It also allows for detecting correct rare tags. AVAILABILITY: Software is available only to non-profit entities and for non-commercial purposes upon request.