Su-in Lee, Serafim Batzoglou
We propose an unsupervised methodology using independent component analysis (ICA) to cluster genes from DNA microarray data. Based on an ICA mixture model of genomic expression patterns, linear and nonlinear ICA finds components that are specific to certain biological processes. Genes that exhibit significant up-regulation or down-regulation within each component are grouped into clusters. We test the statistical significance of enrichment of gene annotations within each cluster. ICA-based clustering outperformed other leading methods in constructing functionally coherent clusters on various datasets. This result supports our model of genomic expression data as composite effect of independent biological processes. Comparison of clustering performance among various including a kernel-based nonlinear ICA algorithm shows that nonlinear ICA performed the best for small datasets and natural-gradient maximization-likelihood worked well for all the datasets.