参考基因组详解
探序基因肿瘤研究院 整理
Ensembl网站:
https://asia.ensembl.org/index.html
下载参考基因组:
人:http://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/,对应gtf文件:http://ftp.ensembl.org/pub/current_gtf/homo_sapiens/
鼠(GRCm38):http://ftp.ensembl.org/pub/release-100/fasta/mus_musculus/dna/,对应gtf:http://ftp.ensembl.org/pub/release-100/gtf/mus_musculus/
序列格式说明:
(与repeats and low complexity region有关)
* 'dna' - unmasked genomic DNA sequences.
* 'dna_rm' - masked genomic DNA. Interspersed repeats and low complexity regions are detected with the RepeatMasker tool and masked by replacingrepeats with 'N's.
* 'dna_sm' - soft-masked genomic DNA. All repeats and low complexity regions have been replaced with lowercased versions of their nucleic base
其他说明可以在README文件中查看。下载时要注意选择好合适格式的fa序列文件。
NCBI网站:
人(序列文件及注释文件):https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/
10x网站(单细胞转录组比对分析):
https://support.10xgenomics.com/spatial-gene-expression/software/downloads/latest
关于版本号问题:
关于参考基因组,有不同版本号,例如GRCh38, hg38, GRCh37, hg19等都是人类的参考基因组序列,但他们代表不同的组装版本代号。
给定参考基因组fa序列文件,输入位置范围查看序列信息:
samtools faidx /home/xxx/hg19.fa chr1:100-200
------------------------------------------------------------
参考文献:
参考基因组&注释文件下载链接大总结 https://www.bilibili.com/read/cv10447213/
参考基因组序列名GRCh38, hg38, GRCh37, hg19怎么区分?https://zhuanlan.zhihu.com/p/461225847
Ensembl 简介及其参考基因组 https://www.jianshu.com/p/b2d154e19d64
共有 0 条评论