Randomized Quantile Residual for Assessing Generalized Linear Mixed Models with Application to Zero-Inflated Microbiome Data
In microbiome research, it is often of interest to investigate the impact of clinical and environmental factors on microbial abundance, which is often quantified as the total number of unique operational taxonomic units (OTUs). The important features of OTU count data are the presence of a large number of zeros and skewness in the positive counts. A common strategy to handle excessive zeros is to use zero-inflated models or zero-modified (hurdle) models. Moreover, subjects in microbiome data often have clustering structure, for example humans from the same family or plants from the same plot; as a result, random effects should be included to account for the clustering effects. Model diagnosis is an essential step to ensure that a fitted model is adequate for the data. However, diagnosing zero-inflated counts models is still a challenging research problem. Pearson and deviance residuals are often used in practice for diagnosing counts models, despite wide recognition that these residuals are far from normality when applied to count data. Randomized quantile residual (RQR) was proposed in literature to circumvent the above problems in traditional residuals. The key idea of the RQR is to randomize the lower tail probability into a uniform random number between the discontinuity gap of cumulative density function (CDF). It can be shown that RQRs are normally distributed under the true model. To the best of our knowledge, RQR has not been applied to diagnose zero inflated or modified mixed effects models. In this thesis project, we have developed generic R functions that can compute RQRs for zero-inflated and zero-modified mixed effects models based on fitting outputs of glmmTMB. We have tested our functions using datasets generated from zero-modified Poisson (ZMP) and zero-modified negative binomial (ZMNB) models. Our simulation studies show that RQRs are normally distributed under the true model. In GOF tests, the type 1 error rates are close to the nominal level 0.05, and the powers of rejecting the wrong models are very good. We have also applied RQR to assess 8 models for a real human microbiome OTU dataset and concluded that ZMNB or zero-inflated negative binomial (ZINB) models provide adequate fits to the dataset.
DegreeMaster of Science (M.Sc.)
DepartmentMathematics and Statistics
SupervisorLi, Longhai; Feng, Cindy
CommitteeKhan, Shahedul; Wright, Laura
Copyright DateJuly 2018
Operational taxonomic units (OTUs)
randomized quantile residual (RQR)
cumulative density function (CDF)
zero-modified Poisson (ZMP)
zero-modified negative binomial (ZMNB)
zero-inflated negative binomial (ZINB).