R语言分析5:TMB分型

肿瘤突变负荷(Tumor Mutational Burden, TMB),指肿瘤细胞基因组中,基因的外显子编码区每兆碱基中发生置换和插入或缺失突变的总数。从理论上来说,高TMB(High Tumor Mutational Burden, TMB-H)的肿瘤患者,具有获得更多新生抗原的潜力,并且与肿瘤内异质性有关,能增强肿瘤免疫原性以及与ICI的反应。

要计算样本的TMB值,必须先获取样本的突变数据,TCGA 的突变流程有 4 种,分别是:muse, varscan2, somaticsniper, mutect2。其中 muse 和 somaticsniper 只能计算点突变,无法识别 indel。而目前新版的TCGA不能直接下载4种制作好的maf文件,可通过 "TCGAbiolinks" 整理

1. 使用/color{green}{TCGAbiolinks}包整理突变数据

1.1 MAF文件的下载

# 安装并加载所需的R包
library(TCGAbiolinks)

query <- GDCquery(
    project = "TCGA-STAD", 
    data.category = "Simple Nucleotide Variation",
    data.type = "Masked Somatic Mutation",
    access = "open"
)
  
GDCdownload(query)
  
GDCprepare(query, save = T,save.filename = "TCGA-STAD_SNP.Rdata") # 这里的Rdata文件是一个数据框,可直接用maftools读取使用

1.2 maftools读取处理MAF文件

library(maftools)

load(file = "TCGA-COAD_SNP.Rdata")
maf.stad <- data

## 查看数据
class(maf.stad)
# [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame" 

dim(maf.stad)
# [1] 183107    140

maf.stad[1:5, 1:10]
# # A tibble: 5 × 10
#   Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Variant_Classification Variant_Type
#                                                                               
# 1 NEXN                 91624 BI     GRCh38     chr1             77929416     77929416 +      Missense_Mutation      SNP         
# 2 COL24A1             255631 BI     GRCh38     chr1             86125858     86125858 +      Missense_Mutation      SNP         
# 3 LRRC8B               23507 BI     GRCh38     chr1             89593023     89593023 +      Frame_Shift_Del        DEL         
# 4 BPNT1                10380 BI     GRCh38     chr1            220069429    220069429 +      Missense_Mutation      SNP         
# 5 KCNF1                 3754 BI     GRCh38     chr2             10913244     10913244 +      Missense_Mutation      SNP

maf <- read.maf(maf.stad)
# -Validating
# -Silent variants: 45460 
# -Summarizing
# --Possible FLAGS among top ten genes:
#   TTN
#   MUC16
#   SYNE1
#   FLG
# -Processing clinical data
# --Missing clinical data
# -Finished in 8.084s elapsed (23.5s cpu) 

1.3 可视化

# 绘制MAF文件的整体结果图
plotmafSummary(maf = maf, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE)
TMB-1

★ 错义突变和移码插入突变较多;C>T 的点突变类型是最多的;TTN 和 MUC16 两个基因的突变次数最多

# 绘制oncoplot图(只展示前 15 个基因)
vc_cols <- RColorBrewer::brewer.pal(n = 8, name = 'Paired')
names(vc_cols) <- c(
  'Frame_Shift_Del',
  'Missense_Mutation',
  'Nonsense_Mutation',
  'Multi_Hit',
  'Frame_Shift_Ins',
  'In_Frame_Ins',
  'Splice_Site',
  'In_Frame_Del'
)
oncoplot(maf = maf, colors = vc_cols, top = 15)

# 使用 somaticInteractions 函数检测互斥突变或同时突变的基因
somaticInteractions(maf = maf, top = 25, pvalue = c(0.05, 0.1))
TMB-2

★ 上方的条形图展示的是样本的突变负荷(若有样本的突变负荷特别高,在后续分析时可考虑是否将其作为离群点去除),右侧的直方图展示了基因的每种突变类型的情况
★ 突变率靠前的基因中,同时出现突变的基因对偏多

# 绘制Oncostrip(展示特定基因在样本中的突变情况)
oncostrip(maf = maf, genes = c("PTGS2","NNMT","CA9"))
TMB-3
# 绘制Transitions Transversions汇总图
laml.titv = titv(maf = maf, plot = FALSE, useSyn = TRUE)
plotTiTv(res = laml.titv)
TMB-4.png

★ 转换和颠换图 (Transition and transversion plot, Ti/Tv) 显示了STAD中SNV的分布,分为 6 个转换和颠换事件。堆积条形图(底部)显示了MAF文件中每个样本的突变谱分布

2. 计算 TMB

stad.tmb <- tmb(maf, captureSize = 38, logScale = T)
dim(stad.tmb)
# [1] 431   6

head(stad.tmb)
#            Tumor_Sample_Barcode total total_perMB total_perMB_log   group Tumor_Sample_ID
# 1: TCGA-RD-A8N4-01A-21D-A364-08     1  0.02631579      -1.5797836 TMB_low    TCGA-RD-A8N4
# 2: TCGA-HU-A4GJ-01A-11D-A25D-08     2  0.05263158      -1.2787536 TMB_low    TCGA-HU-A4GJ
# 3: TCGA-RD-A8N0-01A-12D-A364-08     2  0.05263158      -1.2787536 TMB_low    TCGA-RD-A8N0
# 4: TCGA-FP-8210-01A-11D-2340-08     3  0.07894737      -1.1026623 TMB_low    TCGA-FP-8210
# 5: TCGA-VQ-A8PQ-01A-11D-A410-08     5  0.13157895      -0.8808136 TMB_low    TCGA-VQ-A8PQ
# 6: TCGA-BR-A44T-01A-32D-A24D-08     7  0.18421053      -0.7346856 TMB_low    TCGA-BR-A44T

# 根据TMB平均值进行分组
library(dplyr)
stad.tmb <- stad.tmb %>% mutate(group = if_else(total_perMB_log > mean(total_perMB_log), "TMB_high","TMB_low"))

3. 结合临床数据进行生存分析

# 只取前12位患者编号
stad.tmb$patient <- substr(stad.tmb$Tumor_Sample_Barcode, 1, 12)

# 加载自身临床数据
clin_info[1:4,1:4]
#   patient         entity_submitter_id status time
# 1 TCGA-BR-A4J4 TCGA-BR-A4J4-01A-12R-A251-31      0   16
# 2 TCGA-BR-A4IZ TCGA-BR-A4IZ-01A-32R-A251-31      1  273
# 3 TCGA-RD-A7C1 TCGA-RD-A7C1-01A-11R-A32D-31      1  507
# 4 TCGA-BR-6852 TCGA-BR-6852-01A-11R-1884-13      0 1367

TMB <- merge(clin, stad.tmb, by = "patient")

fit <- survfit(Surv(time, status) ~ group, data = TMB)

ggsurvplot(fit, data = TMB,
           pval = T,
           risk.table = T,
           surv.median.line = "hv", #添加中位生存曲线
           palette = c("red", "blue"),  #更改线的颜色
           legend.labs = c("High risk","Low risk"), #标签
           legend.title = "TMB Score",
           title = "Overall survival", #标题
           ylab = "Cumulative survival (percentage)", xlab = " Time (Days)", #更改横纵坐标
           censor.shape = 124, censor.size = 2, conf.int = FALSE, #删失点的形状和大小
           break.x.by = 500 #横坐标间隔
)
TMB-5

版权声明:
作者:Alex
链接:https://www.techfm.club/p/87543.html
来源:TechFM
文章版权归作者所有,未经允许请勿转载。

THE END
分享
二维码
< <上一篇
下一篇>>