Design and data analysis of kinome microarrays
Catalyzed by protein kinases, phosphorylation is the most important post-translational modification in eukaryotes and is involved in the regulation of almost all cellular processes. Investigating phosphorylation events and how they change in response to different biological conditions is integral to understanding cellular signaling processes in general, as well as to defining the role of phosphorylation in health and disease. A recently-developed technology for studying phosphorylation events is the kinome microarray, which consists of several hundred "spots" arranged in a grid-like pattern on a glass slide. Each spot contains many peptides of a particular amino acid sequence chemically fixed to the slide, with different spots containing peptides with different sequences. Each peptide is a subsequence of a full protein, containing an amino acid residue that is known or suspected to undergo phosphorylation in vivo, as well as several surrounding residues. When a kinome microarray is exposed to cell lysate, the protein kinases in the lysate catalyze the phosphorylation of the peptides on the array. By measuring the degree to which the peptides comprising each spot are phosphorylated, insight can be gained into the upregulation or downregulation of signaling pathways in response to different biological treatments or conditions. There are two main computational challenges associated with kinome microarrays. The first is array design, which involves selecting the peptides to be included on a given array. The level of difficulty of this task depends largely on the number of phosphorylation sites that have been experimentally identified in the proteome of the organism being studied. For instance, thousands of phosphorylation sites are known for human and mouse, allowing considerable freedom to select peptides that are relevant to the problem being examined. In contrast, few sites are known for, say, honeybee and soybean. For such organisms, it is useful to expand the set of possible peptides by using computational techniques to predict probable phosphorylation sites. In this thesis, existing techniques for the computational prediction of phosphorylation sites are reviewed. In addition, two novel methods are described for predicting phosphorylation events in organisms with few known sites, with each method using a fundamentally different approach. The first technique, called PHOSFER, uses a random forest-based machine-learning strategy, while the second, called DAPPLE, takes advantage of sequence homology between known sites and the proteome of interest. Both methods are shown to allow quicker or more accurate predictions in organisms with few known sites than comparable previous techniques. Therefore, the use of kinome microarrays is no longer limited to the study of organisms having many known phosphorylation sites; rather, this technology can potentially be applied to any organism having a sequenced genome. It is shown that PHOSFER and DAPPLE are suitable for identifying phosphorylation sites in a wide variety of organisms, including cow, honeybee, and soybean. The second computational challenge is data analysis, which involves the normalization, clustering, statistical analysis, and visualization of data resulting from the arrays. While software designed for the analysis of DNA microarrays has also been used for kinome arrays, differences between the two technologies prompted the development of PIIKA, a software package specifically designed for the analysis of kinome microarray data. By comparing with methods used for DNA microarrays, it is shown that PIIKA improves the ability to identify biological pathways that are differentially regulated in a treatment condition compared to a control condition. Also described is an updated version, PIIKA 2, which contains improvements and new features in the areas of clustering, statistical analysis, and data visualization. Given the previous absence of dedicated tools for analyzing kinome microarray data, as well as their wealth of features, PIIKA and PIIKA 2 represent an important step in maximizing the scientific value of this technology. In addition to the above techniques, this thesis presents three studies involving biological applications of kinome microarray analysis. The first study demonstrates the existence of "kinotypes" - species- or individual-specific kinome profiles - which has implications for personalized medicine and for the use of model organisms in the study of human disease. The second study uses kinome analysis to characterize how the calf immune system responds to infection by the bacterium Mycobacterium avium subsp. paratuberculosis. Finally, the third study uses kinome arrays to study parasitism of honeybees by the mite Varroa destructor, which is thought to be a major cause of colony collapse disorder. In order to make the methods described above readily available, a website called the SAskatchewan PHosphorylation Internet REsource (SAPHIRE) has been developed. Located at the URL http://saphire.usask.ca, SAPHIRE allows researchers to easily make use of PHOSFER, DAPPLE, and PIIKA 2. These resources facilitate both the design and data analysis of kinome microarrays, making them an even more effective technique for studying cellular signaling.
DegreeDoctor of Philosophy (Ph.D.)
CommitteeNapper, Scott; Bickis, Miķelis; Horsch, Michael; McQuillan, Ian
Copyright DateMay 2014