Published in

Oxford University Press, Bioinformatics, 24(32), p. 3709-3716, 2016

DOI: 10.1093/bioinformatics/btw543

Links

Tools

Export citation

Search in Google Scholar

CSAM: Compressed SAM format

Journal article published in 2016 by Rodrigo Cánovas ORCID, Alistair Moffat, Andrew Turpin
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Motivation: Next generation sequencing machines produce vast amounts of genomic data. For the data to be useful, it is essential that it can be stored and manipulated efficiently. This work responds to the combined challenge of compressing genomic data, while providing fast access to regions of interest, without necessitating decompression of whole files. Results: We describe CSAM (Compressed SAM format), a compression approach offering lossless and lossy compression for SAM files. The structures and techniques proposed are suitable for representing SAM files, as well as supporting fast access to the compressed information. They generate more compact lossless representations than BAM, which is currently the preferred lossless compressed SAM-equivalent format; and are self-contained, that is, they do not depend on any external resources to compress or decompress SAM files. Availability and Implementation: An implementation is available at https://github.com/rcanovas/libCSAM. Contact: canovas-ba@lirmm.fr Supplementary Information: Supplementary data is available at Bioinformatics online.