Time course data enrichment analysis

In the cases of description of time-course expression data, there is a need for accurate methods that allow finding regulated pathways. While commonly used for comparison of two conditions, gene set enrichment is not straightforward for time-course and multiple-condition experiments.

GSEA is a commonly used method for comparing two conditions and is based on comparing metric for pathway with metric for random gene sets of the same size.

In our project we propose three scoring metrics for to be used in GSEA-like sampling-based enrichment for time course data: simple pattern correlation score, absolute mutual correlation score and linear regression based score. These methods are described below.

For each pathway we are subsetting expression matrix that is further normalized to z-score for each row (gene). For every such normalized set regularity score is computed.

Simple pattern correlation score. Firstly, pattern is constructed as a column (sample) wise mean normalized expression vector. Every row is correlated with pattern. Thus correlation vector is obtained and then score is calculated as the mean of this vector.

Absolute mutual correlation score. Every row is correlated with the other ones in the set. Correlations are considered in absolute value. Score is calculated as the mean of obtained absolute correlation vector.

Linear regression based score. For the set linear regression model is constructed where the condition (type of sample and time stamp) is considered to be an explanatory variable and normalized expression is the response. Then the RSE of model is appointed to be a score. In opposite of previous scores, this one in better while lower.

Then sampling-based p-values are computed for each set and each score from empirical distribution. For a given set size we generate a number of random subsets of this size calculate the percentage of random sets with better scores.

We compared our methods with FUNNEL-GSEA: an algorithm based on application of functional PCA (FPCA) transform of gene expression processes with a consequent decomposition of genes set via elastic-net regression. Significance of a differential expression level in a set is proved with a classical MWU test extended with CAMERA approach.

Preliminary results show that both FUNNEL-GSEA and developed methods are able to find pathways, that cannot be found by comparing individual time-points. However, comparison of developed methods with FUNNEL-GSEA is not conclusive yet.

 

Время выполнения проекта: Feb 2017 — Jun 2017