An Automated Infrastructure to Support High-throughput Bioinformatics

Cuccuru, Gianmauro; Leo, Simone; Lianas, Luca; Muggiri, Michele; Pinna, Andrea; Pireddu, Luca; Uva, Paolo; Angius, Andrea; Fotia, Giorgio; Zanetti, Gianluigi

Published in

2014 International Conference on High Performance Computing & Simulation (HPCS)

DOI: 10.1109/hpcsim.2014.6903742

Tools

Export citation

Search in Google Scholar

An Automated Infrastructure to Support High-throughput Bioinformatics

Proceedings article published in 2014 by Gianmauro Cuccuru, Simone Leo

, Luca Lianas, Michele Muggiri, Andrea Pinna, Luca Pireddu, Paolo Uva, Andrea Angius

, Giorgio Fotia, Gianluigi Zanetti

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

The number of domains affected by the big data phenomenon is constantly increasing, both in science and indus- try, with high-throughput DNA sequencers being among the most massive data producers. Building analysis frameworks that can keep up with such a high production rate, however, is only part of the problem: current challenges include dealing with articulated data repositories where objects are connected by multiple re- lationships, managing complex processing pipelines where each step depends on a large number of configuration parameters and ensuring reproducibility, error control and usability by non- technical staff. Here we describe an automated infrastructure built to address the above issues in the context of the analysis of the data produced by the CRS4 next-generation sequencing facility. The system integrates open source tools, either written by us or publicly available, into a framework that can handle the whole data transformation process, from raw sequencer output to primary analysis results.

Published in

Links

Tools

An Automated Infrastructure to Support High-throughput Bioinformatics

Abstract