maSigPro
https://github.com/mjnueda/maSigPro/
https://rdrr.io/github/mjnueda/maSigPro/
https://pubmed.ncbi.nlm.nih.gov/16481333/
https://bioconductor.org/packages/release/bioc/manuals/maSigPro/man/maSigPro.pdf
New version: https://pubmed.ncbi.nlm.nih.gov/24894503/
官方教程:
https://bioconductor.org/packages/release/bioc/vignettes/maSigPro/inst/doc/maSigProUsersGuide.pdf
教程:
0. 分析流程概览
The analysis approach implemented in maSigPro is executed in 5 major steps which are run by the package core functions make.design.matrix(), p.vector(), T.fit(), get.siggenes() and see.genes(). Additionally, the package provides the wrapping function maSigPro() which executes the entire analysis in one go.
Two main steps:
find genes with significant temporal expression changes
find significant differences between experimental groups
Main procedure:
The method defines a general regression model for the data where the experimental groups are identified by dummy variables. The procedure first adjusts this global model by the least-squared technique to identify differentially expressed genes and selects significant genes applying false discovery rate control procedures. Secondly, stepwise regression is applied as a variable selection strategy to study differences between experimental groups and to find statistically significant different profiles. The coefficients obtained in this second regression model will be useful to cluster together significant genes with similar expression patterns and to visualize the results.
dummy variables
: 虚拟变量,在统计分析中用于表示分类数据的变量。通常在回归分析中使用,用于表示数据集中的不同组或类别。
1. 分析多组不同条件实验的差异基因
1.1 make.design.matrix()
Defining the regression model.
定义回归多项式的样式并给出定义此样式所需多项式的值Make a design matrix for regression fit of time series gene expression experiments
design_used <- make.design.matrix(edesign=, degree=, time.col=, repl.col=, group.cols=)
edesign=
: matrix describing experimental design. Rows must be experiment name and columns experiment descriptors. See example.1
degree=
: 用来进行多项式回归的自由度
,推荐是时间点数量 - 1
- 后面三个参数分别用来指定
时间
,重复
和所有实验
所处的列号Note: 一定要把
对照组
放在所有实验的最前面,函数会默认第一个实验为对照组
输出:
输出文件为一个长度为
3
的列表
列表一
:矩阵,为多项式回归时所用到的部分参数
列表二
:向量,给出多项式回归要计算出的参数
列表三
:矩阵,与输入的矩阵相同
1.2 p.vector()
Finding significantly expression genes
这里的差异基因分为以下三类:
- 实验组与对照组没有显著差别。在对照组中,沿时间的表达显著变化
- 在对照组中,沿时间的表达无显著变化。实验组与对照组有显著差异
- 在对照组中,沿时间的表达显著变化。实验组与对照组有显著差异
p.vector performs a regression fit for each gene taking all variables present in the model given by a regression matrix and returns a list of FDR corrected significant genes.
fit_used <- p.vector(data=, design=, Q=, MT.adjust=)
data=
: 表达矩阵
design=
: make.design.matrix() 的输出
Q=
:adjusted p-value
的cutoff
MT.adjust=
:p-value
的矫正方法,如"BH"
输出:
输出是一个长度为
14
的列表
fit_used$p.adjusted
: 对每个基因拟合后的参数进行假设检验并矫正后的p-value
, 基于这个矫正后的p-value
值来判断一个基因是不是显著
fit_used$i
: 显著差异表达基因的个数
fit_used$SELEC
: 矩阵,显著变化基因和其在每个实验中的表达量(行为基因,列为实验)
1.3 T.fit()
Makes a stepwise regression fit for time series gene expression experiments and selects the best regression model for each gene using stepwise regression.
tstep <- T.fit(data=, step.method=)
data=
: p.vector() 函数的输出
step.method=
: 回归方法
输出:
tstep$sol
: 矩阵
1.4 get.siggenes()
Extract significant genes for sets of variables in time series gene expression experiments
sigs_final <- get.siggenes(tstep=, rsq=, vars=)
tstep=
: T.fit() 的输出
rsq=
: cut-off level at the R-squared value for the stepwise regression fit. Only genes with R-squared more than rsq are selected
vars=
:groups
;产生每个组对的差异表达基因
输出:
sigs_final$summary
: 数据框;每一列是一个组对的差异表达基因
1.5 see.genes()
Visualization of differential expression genes
see.genes(sigs_final$sig.genes$ColdvsControl, show.fit = T, dis =design_used$dis, cluster.method="hclust", cluster.data = 1, k = 9)
画图:
2. 分析单一实验中随时间显著变化的基因
2.1 数据准备
版权声明:
作者:zhangchen
链接:https://www.techfm.club/p/118491.html
来源:TechFM
文章版权归作者所有,未经允许请勿转载。
共有 0 条评论