非模式物种KEGG富集数据库准备（二）

admin • 2023-04-20 21:32 • 杂文

一、KEGG数据下载

1、先进入官网：https://www.kegg.jp/

image.png

2、进入KO (KEGG ORTHOLOGY) Database

image.png

3、点击此处选择物种

image.png

4、此处以斑马鱼为例，所以选择dre

image.png

5、下载json文件到本地

image.png

二、json文件的处理

import json
import re
K_ko_dict = {}
with open(json, "r")as f:
    K_ko_file_content = json.load(f) 
for children_info in K_ko_file_content.get("children"):
    for next_children_info in children_info.get("children"):
        for third_children_info in next_children_info.get("children"):
            name_info = third_children_info.get("name")
            pathway_id = re.findall(r'PATH:(.*)]', name_info)
            pathway_name = re.findall(r'/d+/s(.*)/s/[', name_info)
            if pathway_id and pathway_name:
                K_ko_dict[pathway_id[0]+"/t"+pathway_name[0]] = []
                if third_children_info.get("children"):
                    for fourth_children_info in third_children_info.get("children"):
                        K_name = fourth_children_info.get("name").split(" ")[0]
                        gene_name = fourth_children_info.get("name").split(" ")[1]
                        gene_name=re.sub(';','',gene_name)
                        K_ko_dict[pathway_id[0]+"/t"+pathway_name[0]].append(K_name+'/t'+gene_name)
out=open(outfile,'w+')
out.write("pathway_gene_id/tgene_name/tpathway_id/tpathway_name/n")
key1=sorted(K_ko_dict.keys())
for key in key1:
    K_ko_dict[key].sort()
    for i in K_ko_dict[key]:
        out.write(i+'/t'+key+'/n')
out.close()

处理后的文件：

image.png

如果需要gene id 那么需要gtf文件，通过gene name转化即可

注意：json.load(f) 若报错：AttributeError: 'str' object has no attribute 'load'，那么需要修改名称，此处的f指向的名称为json，与import json重复，覆盖了此处的名称。因此会报错

版权声明：
作者：admin
链接：https://www.techfm.club/p/41860.html
来源：TechFM
文章版权归作者所有，未经允许请勿转载。

THE END

二维码

樱花盛开的季节

< <上一篇

带娃

下一篇>>

搜索内容

非模式物种KEGG富集数据库准备（二）

一、KEGG数据下载

二、json文件的处理

取消回复

共有 0 条评论

Ads