Published in

Oxford University Press (OUP), Bioinformatics, 2(30), p. 287-288

DOI: 10.1093/bioinformatics/btt657

Links

Tools

Export citation

Search in Google Scholar

HPC-CLUST: distributed hierarchical clustering for large sets of nucleotide sequences

Journal article published in 2013 by João F. Matias Rodrigues ORCID, Christian von Mering ORCID
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Motivation: Nucleotide sequence data are being produced at an ever increasing rate. Clustering such sequences by similarity is often an essential first step in their analysis—intended to reduce redundancy, define gene families or suggest taxonomic units. Exact clustering algorithms, such as hierarchical clustering, scale relatively poorly in terms of run time and memory usage, yet they are desirable because heuristic shortcuts taken during clustering might have unintended consequences in later analysis steps.