Links

Tools

Export citation

Search in Google Scholar

Setup And Benchmarking Of A New Scalable Sequence Alignment Service For Uhts Data

This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Question mark in circle
Preprint: policy unknown
Question mark in circle
Postprint: policy unknown
Question mark in circle
Published version: policy unknown

Abstract

Aligning translated DNA sequence reads against a protein reference database is a highly demanding task in terms of computational resources. The software of reference for performing that type of sequence alignment, which is currently installed and used on Vital-IT's HPC cluster, is NCBI's BLAST (Basic Local Alignment Search Tool; https://blast.ncbi.nlm.nih.gov/Blast.cgi). While BLAST performs well to align a limited number of reads, scaling up to 10 or 100 of thousands of reads becomes extremely time consuming. A number of alternative software packages has been developed with the promise to perform DNA sequence alignments to protein databases much faster. However, these are not installed yet on Vital-IT's HPC cluster and, most importantly, they have not yet been independently benchmarked to see if they really deliver the claimed increase in performance. This project aims to install and evaluate – using benchmarking based on real-use cases – two new sequence alignment software: "SANSparallel" (http://ekhidna2.biocenter.helsinki.fi/sans) and "DIAMOND" (http://ab.inf.uni-tuebingen.de/software/diamond). Both of these claim to perform sequence alignments to protein databases much faster than the current aligner (NCBI BLAST).