个人杂记

siwei • 2023-12-27 03:33 • 杂文

cat tmp2 | tr A-Z a-z | sed 's/^/w|/s/w//U&/g' | tr " " "," > tmp3
首先全部变成小写，然后再首字母大写(tmp2是空格分隔符)

根据指定基因组区域的提取bam，可以使用以下命令

samtools view -hb chr:start-end  wgs.sort.bam > target.region.bam
#根据bed文件来提取
samtools view -hb -L target.bed  wgs.sort.bam > target.region.bam

bedtools intersect -a  wgs.sort.bam  -b target.bed  > target.region.bam

sambamba view -hb chr:start-end  wgs.sort.bam > target.region.bam
#根据bed文件来提取可以用 `sambamba slice `
sambamba slice -L target.bed wgs.sort.bam > target.region.bam

#sambamba slice -L 会是速度最快资源消耗最少的

把gff/gtf转为genebank格式, ref: https://www.biostars.org/p/72220/
The EMBOSS tool seqret would be a possible option.

seqret   -sequence   reference.fasta   -feature   -fformat gff   -fopenfile 1.gff   -osformat genbank   -auto
#但是细节上需要自行修改

awk中的if与else

awk '{if($2<10)print $1"/t"$2-10 ;else print $1"/t"$2+10} input > output

批量生成sed命令行

awk '{print $1"/t"$2}' rename.txt | tr "/t" "#" | awk '{print "sed -i ""'/''""s""#"$1"#""g""'/''"" input"}' > run.sh
sh run.sh
#input就是需要批量sed的文件
#rename.txt有两列，我希望把第一列的内容全部批量替换为第二列

转载自生物信息文件格式中的坐标系以及互相转换
https://www.biochen.org/cn/blog/2020/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E4%B8%AD%E7%9A%84%E5%9D%90%E6%A0%87%E7%B3%BB/

生物信息文件格式中有很多格式是基于基因组坐标的，比如常见的BED格式或者GTF格式。然而对于对标系的定义，这两者有着截然的区别。BED格式第一个位置的下标是0，区间前开后闭；而GTF格式第一个位置的下标是1，区间都是闭的。不妨我们称前者为0-based，后者为1-based。0-based的优点是长度的计算很简单，直接相减就可以得到序列的长度；而1-based的优点是比较直观

除了BED格式和GTF格式，下表列举了其他格式的情况。

长度计算

Length(0-based) = End(0-based) - Start(0-based)
Length(1-based) = End(1-based) - Start(1-based) + 1

坐标转换

0-based转1-based
Start(1-based) = Start(0-based) + 1
End(1-based) = End(0-based)

1-based转0-based
Start(0-based) = Start(1-based) - 1
End(0-based) = End(1-based)

关于ChIpSeeker的注释

chip.png

有时候会出现，左边两列(geneChr/transcriptid) 和第三列 (distanceToTSS) 不同的情况
这是因为，左边两列表示的是输入的bed文件比如peak, 是落在哪个基因上
右边也就是第三列则是这个peak, 距离那个gene的TSS最近

如果没有额外的信息，基因的第一个exon的第一个碱基是TSS

Von Neumann Entropy (VNE) index的含义
This likely reflects a more disordered (the highentropy status) and relaxed chromatin architecture at early development (E38 and E80) (Fig.2b).
In agreement with the phenomenon that 3D structure in early mammalian embryos is initially obscure but gradually established throughout development45–47, the relatively loose chromatin folding highlights a highly plastic state for hepatocyte genomes at the early stages of development and may be essential for the rapid functional transitions in the liver before and after birth.

https://doi.org/10.1038/s41421-022-00416-z; Fig. 2a

We observed a significantly higher VNE in the POF stage (0.86, P < 0.016, Wilcoxon rank-sum test) than in the SWF (0.80) and F1 stages (0.79) (Fig. 2a). This is likely due to a more disordered and relaxed chromatin architecture in the POF stage (Fig. 2b), while the architecture is more stable and ordered in mature GCs at the F1 stages, which aligns with the relaxed genome architecture observed during senescence

https://doi.org/10.1038/s41467-021-27800-9 ; Fig. 2a

这句话出自文章https://doi.org/10.1080/19491034.2021.1910437, 文章中有这么一句话，并引用了两篇文献。
这个文章也是提供了一个可以计算VNE参数的工具。
Biologically, genomic regions with high entropy likely correlate with high proportions of euchromatin, as euchromatin is more structurally permissive than heterochromatin [1, 2]
1.Macarthur BD, Lemischka IR. Statistical mechanics of pluripotency. Cell. 2013;154(3):484–489
2.Rajapakse I, Groudine M, Mesbahi M. What can systems theory of networks offer to biology? PLoS Comput Biol. 2012;8(6):e1002543.

以下两句话出自文章: https://doi.org/10.1016/j.neo.2020.12.010
In the context of genome structure, the higher the entropy, the more conformations available to the system [46] . If the distant ends of a genomic region, e.g., a gene, interact to form a loop, there are fewer conformations available to the gene and thus the entropy of that genomic region is reduced.
46.Phillips, Rob, et al. "Physical biology of the cell." American Journal of Physics 78.11 (2010): 1230-1230.
and
We apply one such approach - a derivative of VNE - to measure local chromatin organization of individual gene regions [59]. Higher VNE values indicate that the number of conformations available to the gene and its immediate neighborhood are higher, indicating that chromatin is more accessible.
按照这个作者做的来看，VNE和基因的表达量是正相关的

The more disordered (and permissive) chromatin in the pgEpiSCs was also evident based on its high-entropy status.
然后引用了下图, 下图中的d图的图注是: The extent of disorder in chromatin structure (quantified by the Von Neumann Entropy (VNE))

https://doi.org/10.1038/s41422-021-00592-9; Fig. 5d

We found that Di-SG had higher entropy (Fig. 1C), suggestive of less compact chromatin structural organization in Di-SG.

https://doi.org/10.1016/j.jbc.2021.101559; Fig. 1c

https://github.com/HuiyangYu/PanDepth 基于sam bam cram算基因组（和基因集）的深度和覆盖度超级快高效的工具（低内存），超级大（几十G）的bam 也一两分钟的事。另外：默认内存至少是bamdeal 的1之10。速度也十分快。

李恒大牛新作｜compleasm：比BUSCO的更快、更准确评估工具
https://github.com/huangnengCSU/compleasm

Rather than reporting so much detail in the abstract, it might be better to make a more general statement like: "Deletions affecting introns and/or coding regions of numerous genes may have contributed to phenotypic differences between A. baiyi and other Ablax species"

Comparative Recombination Rates in the Rat, Mouse, and Human Genomes
10.1101/gr.1970304

遗传距离的系数转换，参考上述文献

awk '{print $1"/t"$4"/t"$4*0.000554779412}' Chr27.map | sort -Vk 1 | awk '{print "Chr"$1"/t"$2"/t"$3}' >  Chr27.genetic.map

SNP的pos * 0.000554779412
物理位置*0.000554779412

Phylogenomics-DensiTree绘制详细教程
所谓DensiTree，其实就是将多颗进化树的拓扑结构进行的叠加，以可视化进化树间的拓扑冲突(或基因树异质性)。绘制DensiTree绘制可以使用DensiTree软件(现在已经整合到BEAST2安装包中)，也可以使用R包phangorn进行。下面记录一下DensiTree的绘制过程。
https://mp.weixin.qq.com/s/PvxX02Pw_NPiV8aTpxL8TQ

版权声明：
作者：siwei
链接：https://www.techfm.club/p/95665.html
来源：TechFM
文章版权归作者所有，未经允许请勿转载。

THE END

GitHub

二维码

窥探风格：东方优雅的时尚秘密

< <上一篇

光影魔术师：DLT8M04TS单键调光台灯的故事

下一篇>>

搜索内容

个人杂记

长度计算

坐标转换

取消回复

共有 0 条评论

Ads