rMATS : RNA-seq 可变剪切分析 (可比较两有重复样本)
Reference :
- Github :
-
Xinglab/rmats-turbo Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data
-
Xinglab/rmats2sashimiplot :produces a sashimiplot visualization of rMATS output
-
- Sourceforge :Multivariate Analysis of Transcript Splicing (MATS) (sourceforge.net)
- Paper
Installation
cd
git clone https://github.com/Xinglab/rmats-turbo.git
cd rmats-turbo
./build_rmats --conda
conda activate {my.bio-tools.path}/rmats-turbo/conda_envs/rmats
rmats.py
cd
git clone https://github.com/Xinglab/rmats2sashimiplot.git
cd rmats2sashimiplot/
bash 2to3.sh
pip uninstall rmats2sashimiplot
python ./setup.py install
rmats2sashimiplot
Usage
- Take bam files & gtf file as input.
#-------------------------------------------
# generate bam files from STAR, arguments from script of rmats.
## --s1 --s2 take fastq files as input
wd=
output=$wd/results
index=refdata/cellranger_ref/hg38/star
gtf=refdata/cellranger_ref/hg38/genes/genes.gtf
cores=19
cd $wd
for i in `cat SRR_Acc_List.txt`
do
STAR --readFilesIn $output/01cleandata/${i}_1_paired_clean.fastq.gz $output/01cleandata/${i}_2_paired_clean.fastq.gz /
--chimSegmentMin 2 --outFilterMismatchNmax 3 /
--alignEndsType EndToEnd /
--runThreadN $cores /
--outSAMstrandField intronMotif /
--outSAMtype BAM Unsorted / # Failed with BAM SortedByCoordinate
--alignSJDBoverhangMin 6 /
--alignIntronMax 299999 /
--genomeDir $index /
--sjdbGTFfile $gtf /
--outFileNamePrefix $output/02alignment-hg38/${i}_ /
--readFilesCommand zcat
samtools sort -@ $cores $output/02alignment-hg38/${i}_Aligned.out.bam -o $output/02alignment-hg38/${i}.bam
samtools index -@ $cores $output/02alignment-hg38/${i}.bam
done
#-------------------------------------------
vim rmats.sh
#!/bin/bash
echo "/path/to/bam/ctr_rep1.bam,/path/to/bam/ctr_rep2.bam" > b1.txt
echo "/path/to/bam/treat_rep1.bam,/path/to/bam/treat_rep2.bam" > b2.txt
gtf=/path/to/gtf/xxxx.gtf
out_dir=/path/to/output
tmp_dir=/path/to/tmp
core=16
read_length=150
rmats.py --b1 b1.txt --b2 b2.txt --od $out_dir --tmp $tmp_dir /
--readLength $read_length --variable-read-length --nthread $core
#--------------------------------------------
python rmats2sashimiplot/src/rmats2sashimiplot/rmats2sashimiplot.py /
-o plot-RI-1-2/ /
--l1 ctr / #label the sample1
--l2 treat /
--b1 /path/to/bam/ctr_rep1.bam,/path/to/bam/ctr_rep2.bam /
--b2 /path/to/bam/treat_rep1.bam,/path/to/bam/treat_rep2.bam /
-e /path/to/output/RI.MATS.JC.txt /
--event-type RI
python rmats.py -h
usage: rmats.py [options]
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--gtf GTF An annotation of genes and transcripts in GTF format
--b1 B1 A text file containing a comma separated list of the
BAM files for sample_1. (Only if using BAM)
--b2 B2 A text file containing a comma separated list of the
BAM files for sample_2. (Only if using BAM)
--s1 S1 A text file containing a comma separated list of the
FASTQ files for sample_1. If using paired reads the
format is ":" to separate pairs and "," to separate
replicates. (Only if using fastq)
--s2 S2 A text file containing a comma separated list of the
FASTQ files for sample_2. If using paired reads the
format is ":" to separate pairs and "," to separate
replicates. (Only if using fastq)
--od OD The directory for final output from the post step
--tmp TMP The directory for intermediate output such as ".rmats"
files from the prep step
-t {paired,single} Type of read used in the analysis: either "paired" for
paired-end data or "single" for single-end data.
Default: paired
--libType {fr-unstranded,fr-firststrand,fr-secondstrand}
Library type. Use fr-firststrand or fr-secondstrand
for strand-specific data. Only relevant to the prep
step, not the post step. Default: fr-unstranded
--readLength READLENGTH
The length of each read
--variable-read-length
Allow reads with lengths that differ from --readLength
to be processed. --readLength will still be used to
determine IncFormLen and SkipFormLen
--anchorLength ANCHORLENGTH
The "anchor length" or "overhang length" used when
counting the number of reads spanning splice
junctions. A minimum number of "anchor length"
nucleotides must be mapped to each end of a given
junction. The minimum value is 1 and the default value
is set to 1 to make use of all possible splice
junction reads.
--tophatAnchor TOPHATANCHOR
The "anchor length" or "overhang length" used in the
aligner. At least "anchor length" NT must be mapped to
each end of a given junction. The default is 1. (Only
if using fastq)
--bi BINDEX The directory name of the STAR binary indices (name of
the directory that contains the SA file). (Only if
using fastq)
--nthread NTHREAD The number of threads. The optimal number of threads
should be equal to the number of CPU cores. Default: 1
--tstat TSTAT The number of threads for the statistical model. If
not set then the value of --nthread is used
--cstat CSTAT The cutoff splicing difference. The cutoff used in the
null hypothesis test for differential splicing. The
default is 0.0001 for 0.01% difference. Valid: 0 <=
cutoff < 1. Does not apply to the paired stats model
--task {prep,post,both,inte,stat}
Specify which step(s) of rMATS to run. Default: both.
prep: preprocess BAMs and generate a .rmats file.
post: load .rmats file(s) into memory, detect and
count alternative splicing events, and calculate P
value (if not --statoff). both: prep + post. inte
(integrity): check that the BAM filenames recorded by
the prep task(s) match the BAM filenames for the
current command line. stat: run statistical test on
existing output files
--statoff Skip the statistical analysis
--paired-stats Use the paired stats model
--novelSS Enable detection of novel splice sites (unannotated
splice sites). Default is no detection of novel splice
sites
--mil MIL Minimum Intron Length. Only impacts --novelSS
behavior. Default: 50
--mel MEL Maximum Exon Length. Only impacts --novelSS behavior.
Default: 500
--allow-clipping Allow alignments with soft or hard clipping to be used
--fixed-event-set FIXED_EVENT_SET
A directory containing fromGTF.[AS].txt files to be
used instead of detecting a new set of events```
Output
-
rMATS 能识别5种Altenative Splicing events (AS : SE / RI / A5SS/A3SS/MXE,对于每种AS,都会有给出比对到exon junction的count。
[AS].MATS.JC.txt
or[AS].MATS.JCEC.txt
-
23列,值得关注 :PValue ,IncLevelDifference=
average(IncLevel1) - average(IncLevel2)
。有几列描述这个剪切事件的染色体位置,名称根据不同AS有点不同。 -
类似于gene-level的RNA-seq 分析,需要用长度进行 normalization。在rMATS中,用IncLevel (Inclusion Level)对 AS进行定量描述。
-
用likelihood-ratio test ,将sample1和sample2的某event的IncLevel 的差值进行检验。(--cstat)
共有 0 条评论