Geometry of the Sample Frequency Spectrum and the Perils of Demographic Inference

Rosen, Zvi; Bhaskar, Anand; Roch, Sebastien; Song, Yun S.

Published in

Oxford University Press, Genetics, 2(210), p. 665-682, 2018

DOI: 10.1534/genetics.118.300733

Tools

Export citation

Search in Google Scholar

Geometry of the Sample Frequency Spectrum and the Perils of Demographic Inference

Journal article published in 2018 by Zvi Rosen, Anand Bhaskar

, Sebastien Roch, Yun S. Song

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Abstract Numerous studies in population genetics have been based on analyzing the sample frequency spectrum (SFS) summary statistic. Most SFS-based inference methods can display pathological behavior in optimization: some demographic model parameters can degenerate to 0... The sample frequency spectrum (SFS), which describes the distribution of mutant alleles in a sample of DNA sequences, is a widely used summary statistic in population genetics. The expected SFS has a strong dependence on the historical population demography and this property is exploited by popular statistical methods to infer complex demographic histories from DNA sequence data. Most, if not all, of these inference methods exhibit pathological behavior, however. Specifically, they often display runaway behavior in optimization, where the inferred population sizes and epoch durations can degenerate to zero or diverge to infinity, and show undesirable sensitivity to perturbations in the data. The goal of this article is to provide theoretical insights into why such problems arise. To this end, we characterize the geometry of the expected SFS for piecewise-constant demographies and use our results to show that the aforementioned pathological behavior of popular inference methods is intrinsic to the geometry of the expected SFS. We provide explicit descriptions and visualizations for a toy model, and generalize our intuition to arbitrary sample sizes using tools from convex and algebraic geometry. We also develop a universal characterization result which shows that the expected SFS of a sample of size n under an arbitrary population history can be recapitulated by a piecewise-constant demography with only κn epochs, where κn is between n/2 and 2n−1. The set of expected SFS for piecewise-constant demographies with fewer than κn epochs is open and nonconvex, which causes the above phenomena for inference from data.

Published in

Links

Tools

Geometry of the Sample Frequency Spectrum and the Perils of Demographic Inference

Abstract