Institutions | About Us | Help | Gaeilge
rian logo

Go Back
Computational analysis of gene expression data
Kerr, Gráinne
Gene expression is central to the function of living cells. While advances in sequencing and expression measurement technology over the past decade has greatly facilitated the further understanding of the genome and its functions, the characterisation of functional groups of genes remains one of the most important problems in modern biology. Technological advancements have resulted in massive information output, with the priority objective shifting to development of data analysis methods. As such, a large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments, and consequently, confusion regarding the best approach to take. Common techniques applied are not necessarily the most applicable for the analysis of patterns in microarray data. This confusion is clarified through provision of a framework for the analysis of clustering technique and investigation of how well they apply to gene expression data. To this end, the properties of microarray data itself are examined, followed by an examination of the properties of clustering techniques and how well they apply to gene expression. Clearly, each technique will find patterns even if the structures are not meaningful in a biological context and these structures are not usually the same for different algorithms. Also, these algorithms are inherently biased as properties of clusters reflect built in clustering criteria. From these considerations, it is clear that cluster validation is critical for algorithm development and verification of results, usually based on a manual, lengthy and subjective exploration process. Consequently, it is key to the interpretation of the gene expression data. We carry out a critical analysis of current methods used to evaluate clustering results. Clusters obtained from real and synthetic datasets are compared between algorithms. To understand the properties of complex gene expression datasets, graphical representations can be used. Intuitively, the data can be represented in terms of a bipartite graph, with weighted edges between gene-sample node couples corresponding to significant expression measurements of interest. In this research, this method of representation is extensively studied and methods are used, in combination with probabilistic models, to develop new clustering techniques for analysis of gene expression data in this mode of representation. Performance of these techniques can be influenced both by the search algorithm, and, by the graph weighting scheme and both merit vigorous investigation. A novel edge-weighting scheme, based on empirical evidence, is presented. The scheme is tested using several benchmark datasets at various levels of granularity, and comparisons are provided with current a popular data analysis method used in the Bioinformatics community. The analysis shows that the new empirical based scheme developed out-performs current edge-weighting methods by accounting for the subtleties in the data through a data-dependent threshold analysis, and selecting ‘interesting’ gene-sample couples based on relative values. The graphical theme of gene expression analysis is further developed by construction of a one-mode gene expression network which specifically focuses on local interactions among genes. Classical network theory is used to identify and examine organisational properties in the resulting graphs. A new algorithm, GraphCreate, is presented which finds functional modules in the one-mode graph, i.e. sets of genes which are coherently expressed over subsets of samples, and a scoring scheme developed (using bi-partite graph properties as a basis) to weight these modules. Use of this representation is used to extensively study published gene expression datasets and to identify functional modules of genes with GraphCreate. This work is important as it advances research in the area of transcriptome analyiii sis, beyond simply finding groups of coherently expressed genes, by developing a general framework to understand how and when gene sets are interacting.
Keyword(s): Bioinformatics; Computer simulation; microarray data analysis; gene expression data; supervised and unsupervised clustering methods; graph theory
Publication Date:
Type: Doctoral thesis
Peer-Reviewed: No
Language(s): English
Institution: Dublin City University
Citation(s): Kerr, Gráinne (2009) Computational analysis of gene expression data. PhD thesis, Dublin City University.
Publisher(s): Dublin City University. School of Computing
File Format(s): application/pdf
Supervisor(s): Ruskin, Heather J.
Crane, Martin
Related Link(s):,
First Indexed: 2014-04-17 05:43:30 Last Updated: 2018-05-15 06:18:08