Reverse Engineering of Biological Systems
Gene regulatory network (GRN) consists of a set of genes and regulatory relationships between the genes. As outputs of the GRN, gene expression data contain important information that can be used to reconstruct the GRN to a certain degree. However, the reverse engineer of GRNs from gene expression data is a challenging problem in systems biology. Conventional methods fail in inferring GRNs from gene expression data because of the relative less number of observations compared with the large number of the genes. The inherent noises in the data make the inference accuracy relatively low and the combinatorial explosion nature of the problem makes the inference task extremely difficult. This study aims at reconstructing the GRNs from time-course gene expression data based on GRN models using system identification and parameter estimation methods. The main content consists of three parts: (1) a review of the methods for reverse engineering of GRNs, (2) reverse engineering of GRNs based on linear models and (3) reverse engineering of GRNs based on a nonlinear model, specifically S-systems. In the first part, after the necessary background and challenges of the problem are introduced, various methods for the inference of GRNs are comprehensively reviewed from two aspects: models and inference algorithms. The advantages and disadvantages of each method are discussed. The second part focus on inferring GRNs from time-course gene expression data based on linear models. First, the statistical properties of two sparse penalties, adaptive LASSO and SCAD, with an autoregressive model are studied. It shows that the proposed methods using these two penalties can asymptotically reconstruct the underlying networks. This provides a solid foundation for these methods and their extensions. Second, the integration of multiple datasets should be able to improve the accuracy of the GRN inference. A novel method, Huber group LASSO, is developed to infer GRNs from multiple time-course data, which is also robust to large noises and outliers that the data may contain. An efficient algorithm is also developed and its convergence analysis is provided. The third part can be further divided into two phases: estimating the parameters of S-systems with system structure known and inferring the S-systems without knowing the system structure. Two methods, alternating weighted least squares (AWLS) and auxiliary function guided coordinate descent (AFGCD), have been developed to estimate the parameters of S-systems from time-course data. AWLS takes advantage of the special structure of S-systems and significantly outperforms one existing method, alternating regression (AR). AFGCD uses the auxiliary function and coordinate descent techniques to get the smart and efficient iteration formula and its convergence is theoretically guaranteed. Without knowing the system structure, taking advantage of the special structure of the S-system model, a novel method, pruning separable parameter estimation algorithm (PSPEA) is developed to locally infer the S-systems. PSPEA is then combined with continuous genetic algorithm (CGA) to form a hybrid algorithm which can globally reconstruct the S-systems.
DegreeDoctor of Philosophy (Ph.D.)
SupervisorWu, FangXiang; Zhang, Chris
CommitteeChen, Daniel; Burton, Richard; Faried, Sherif
Copyright DateJuly 2014
Gene Regulatory Network