Full text: Download
The human microbiome has been recently shown to be associated with disease risks and has important implications in risk stratification and precision medicine. Due to abundant taxa in the human body, microbiome data are high-dimensional and compositional. Dirichlet distributions and their generalization are used to characterize the dependence structures of microbial data. Another existing method for fitting microbiome data employed Gaussian graphical model using the centered log-transformation (CLR). However, Dirichlet distributions are not able to infer networks or to estimate some extremely rare probabilities. On the other hand, it is hard to interpret the network analysis results using CLR. Furthermore, the data analysis showed that there is a lack of efficient multivariate distributions for fitting microbiome data, which results in inadequate statistical inferences. In this paper, we propose new multivariate distributions for modeling the dependence structures of the high dimensional and compositional microbiome data using inverse gamma distributions and copula techniques. The data analysis in the American gut project shows our proposed methods perform well.