Published in

Oxford University Press, Bioinformatics, 11(36), p. 3314-3321, 2020

DOI: 10.1093/bioinformatics/btaa191

Oxford University Press, Bioinformatics, 17(36), p. 4673-4673, 2020

DOI: 10.1093/bioinformatics/btaa665

Links

Tools

Export citation

Search in Google Scholar

A Blind and Independent Benchmark Study for Detecting Differentially Methylated Regions in Plants

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Abstract Motivation Bisulfite sequencing (BS-seq) is a state-of-the-art technique for investigating methylation of the DNA to gain insights into the epigenetic regulation. Several algorithms have been published for identification of differentially methylated regions (DMRs). However, the performances of the individual methods remain unclear and it is difficult to optimally select an algorithm in application settings. Results We analyzed BS-seq data from four plants covering three taxonomic groups. We first characterized the data using multiple summary statistics describing methylation levels, coverage, noise, as well as frequencies, magnitudes and lengths of methylated regions. Then, simulated datasets with most similar characteristics to real experimental data were created. Seven different algorithms (metilene, methylKit, MOABS, DMRcate, Defiant, BSmooth, MethylSig) for DMR identification were applied and their performances were assessed. A blind and independent study design was chosen to reduce bias and to derive practical method selection guidelines. Overall, metilene had superior performance in most settings. Data attributes, such as coverage and spread of the DMR lengths, were found to be useful for selecting the best method for DMR detection. A decision tree to select the optimal approach based on these data attributes is provided. The presented procedure might serve as a general strategy for deriving algorithm selection rules tailored to demands in specific application settings. Availability and implementation Scripts that were used for the analyses and that can be used for prediction of the optimal algorithm are provided at https://github.com/kreutz-lab/DMR-DecisionTree. Simulated and experimental data are available at https://doi.org/10.6084/m9.figshare.11619045. Contact ckreutz@imbi.uni-freiburg.de Supplementary information Supplementary data are available at Bioinformatics online.