LD计算及绘图
LD(Linkage Disequilibrium)
连锁不平衡是指在一对基因座或基因组区域中,两个或多个等位基因的组合频率与它们在随机组合假设下的期望频率之间存在显著差异。简单来说,如果两个或多个基因座的等位基因之间的组合频率不同于它们在种群中独立随机组合的期望频率,就存在LD连锁不平衡。
LD连锁不平衡通常是由于基因座之间存在物理上的或遗传上的连锁(linkage)
而产生的。```连锁是指在同一染色体上的基因座之间存在紧密的遗传联系,使得它们在遗传上倾向于一起传递给后代``。当连锁发生时,不同基因座上的等位基因会以非随机的方式组合,导致LD连锁不平衡的出现。
LD连锁不平衡的存在对遗传学和基因组研究具有重要意义。它可以揭示基因座之间的遗传联系和基因组结构,为基因相关性研究、基因定位和基因组关联研究提供重要的信息。此外,LD连锁不平衡还与种群遗传演化、复杂疾病的易感性等方面有关,对理解和预测相关表型和疾病风险也有一定的价值。
# 下载
git clone https://github.com/BGI-shenzhen/PopLDdecay.git
# 安装
cd PopLDdecay
chmod 755 ./configure
./configure
make
计算LD
~/soft/PopLDdecay/PopLDdecay -InVCF SE.vcf -OutStat ALL_SNP
A simple example
#1) For gatk VCF file deal , run PopLDdecay direct
./bin/PopLDdecay -InVCF SNP.vcf.gz -OutStat Lddecay.stat.gz
# 2) For plink [.ped .map], chang plink 2 genotype first 2) run PopLDdecay
perl bin/mis/plink2genotype.pl -inPED in.ped -inMAP in.map -outGenotype out.genotype ; ./bin/PopLDdecay -InGenotype out.genotype -OutStat LDdecay.stat.gz
# 3) To Calculate the subgroup GroupA LDdecay in VCF Files # put GroupA sample names into GroupA_sample.list file
./bin/PopLDdecay -InVCF -OutStat -SubPop GroupA_sample.list
Muti population
This is common situation in the LD decay analysis. For example, if there are 50 samples (wild1, wild2, wild3…wild25, cul1, cul2, cul3…cul25) in the VCF file,
To compare the LD decay of these two groups (wild vs cultivation), first of all, put their sample names into own file list for each group, column or row is ok.
./bin/PopLDdecay -InVCF In.vcf.gz -OutStat wild.stat.gz -SubPop wildName.list
./bin/PopLDdecay -InVCF In.vcf.gz -OutStat cul.stat.gz -SubPop culName.list
# created manually muti.list by yourself
perl bin/Plot_MutiPop.pl -inList muti.list -output OutputPrefix
Note:
A.The can list as follow(column or row is ok):
wild1
wild2
…
Wild25
B.The format of had two columns , the file path of population result and the population flag, such as:
/ifshk7/BC_PS/Lddecay/wild.stat.gz wild
/ifshk7/BC_PS/Lddecay/cul.stat.gz cultivation
画图
# 2.1 For one Population
perl bin/Plot_OnePop.pl -inFile LDdecay.stat.gz -output Fig
# 2.2 For one Population multi chr # List Format [chrResultPathWay]
perl bin/Plot_OnePop.pl -inList Chr.ResultPath.List -output Fig
# 2.3 For multi Populations # List Format :[Pop.ResultPath PopID]
perl bin/Plot_MultiPop.pl -inList Pop.ResultPath.list -output Fig
perl /public/home/fengting/soft/PopLDdecay/bin/Plot_MultiPop.pl -inList res -output Fig
共有 0 条评论