Elsevier, Journal of Biomedical Informatics, 5(45), p. 922-930, 2012
DOI: 10.1016/j.jbi.2012.03.004
Full text: Download
Discovering ways to reconstruct reliable Single Individual Haplotypes (SIHs) becomes one of the core issues in the whole-genome research nowadays as previous research showed that haplotypes contain more information than individual Singular Nucleotide Polymorphisms (SNPs). Although with advances in high-throughput sequencing technologies obtaining sequence information is becoming easier in today's laboratories, obtained sequences from current technologies always contain inevitable sequence errors and missing information. The SIH reconstruction problem can be formulated as bi-partitioning the input SNP fragment matrix into paternal and maternal sections to achieve minimum error correction (MEC) time; the problem that is proved to be NP-hard. Several heuristics or greedy algorithms have already been designed and implemented to solve this problem, most of them however (1) do not have the ability to handle data sets with high error rates and/or (2) can only handle binary input matrices. In this study, we introduce a Genetic Algorithm (GA) based method, named GAHap, to reconstruct SIHs with lowest MEC times. GAHap is equipped with a well-designed fitness function to obtain better reconstruction rates. GAHap is also compared with existing methods to show its ability in generating highly reliable solutions.