Springer Verlag, Lecture Notes in Computer Science, p. 311-325
DOI: 10.1007/978-3-662-44753-6_23
Full text: Download
In this paper we present an efficient external memory algorithm to compute the string graph from a collection of reads, which is a fundamental data representation used for sequence assembly. Our algorithm builds upon some recent results on lightweight Burrows-Wheeler Transform (BWT) and Longest Common Prefix (LCP) construction providing, as a by-product, an efficient procedure to extend intervals of the BWT that could be of independent interest. We have implemented our algorithm and compared its efficiency against SGA-the most advanced assembly string graph construction program.