Chromap

Chromap 染色质图谱的快速预处理和比对

Chromap is an ultrafast method for aligning and preprocessing high throughput chromatin profiles. Typical use cases include:
(1) trimming sequencing adapters, mapping bulk ATAC-seq or ChIP-seq genomic reads to the human genome and removing duplicates;
(2) trimming sequencing adapters, mapping single cell ATAC-seq genomic reads to the human genome, correcting barcodes, removing duplicates and performing Tn5 shift;
(3) split alignment of Hi-C reads against a reference genome.
In all these three cases, Chromap is 10-20 times faster while being accurate.

Install

conda install -c bioconda -c conda-forge chromap

Usage

chromap -h
Fast alignment and preprocessing of chromatin profiles
Usage:
  chromap [OPTION...]

  -v, --version  Print version
  -h, --help     Print help


=========================================================
 Indexing options:
  -i, --build-index          Build index
      --min-frag-length INT  Min fragment length for choosing k and w automatically [30]
  -k, --kmer INT             Kmer length [17]
  -w, --window INT           Window size [7]


=========================================================
 Mapping options:
      --preset STR              Preset parameters for mapping reads (always applied before other options) []
                                atac: mapping ATAC-seq/scATAC-seq reads
                                chip: mapping ChIP-seq reads
                                hic: mapping Hi-C reads
      --split-alignment         Allow split alignments
  -e, --error-threshold INT     Max # errors allowed to map a read [8]
  -s, --min-num-seeds INT       Min # seeds to try to map a read [2]
  -f, --max-seed-frequencies INT[,INT]
                                Max seed frequencies for a seed to be selected [500,1000]
  -l, --max-insert-size INT     Max insert size, only for paired-end read mapping [1000]
  -q, --MAPQ-threshold INT      Min MAPQ in range [0, 60] for mappings to be output [30]
      --min-read-length INT     Min read length [30]
      --trim-adapters           Try to trim adapters on 3
      --remove-pcr-duplicates   Remove PCR duplicates
      --remove-pcr-duplicates-at-bulk-level
                                Remove PCR duplicates at bulk level for single cell data
      --remove-pcr-duplicates-at-cell-level
                                Remove PCR duplicates at cell level for single cell data
      --Tn5-shift               Perform Tn5 shift
      --low-mem                 Use low memory mode
      --bc-error-threshold INT  Max Hamming distance allowed to correct a barcode [1]
      --bc-probability-threshold FLT
                                Min probability to correct a barcode [0.9]
  -t, --num-threads INT         # threads for mapping [1]


=========================================================
 Input options:
  -r, --ref FILE                Reference file
  -x, --index FILE              Index file
  -1, --read1 FILE              Single-end read files or paired-end read files 1
  -2, --read2 FILE              Paired-end read files 2
  -b, --barcode FILE            Cell barcode files
      --barcode-whitelist FILE  Cell barcode whitelist file
      --read-format STR         Format for read files and barcode files  ["r1:0:-1,bc:0:-1" as 10x Genomics single-end
                                format]


=========================================================
 Output options:
  -o, --output FILE             Output file
      --output-mappings-not-in-whitelist
                                Output mappings with barcode not in the whitelist
      --chr-order FILE          Custom chromosome order file. If not specified, the order of reference sequences will
                                be used
      --BED                     Output mappings in BED/BEDPE format
      --TagAlign                Output mappings in TagAlign/PairedTagAlign format
      --SAM                     Output mappings in SAM format
      --pairs                   Output mappings in pairs format (defined by 4DN for HiC data)
      --pairs-natural-chr-order FILE
                                Custom chromosome order file for pairs flipping. If not specified, the custom
                                chromosome order will be used
      --barcode-translate FILE  Convert barcode to the specified sequences during output
      --summary FILE            Summarize the mapping statistics at bulk or barcode level
  • 和其他比对软件一样,先建index
chromap -i -r ref.fa -o index
  # ChIP-seq reads
chromap --preset chip -x index -r ref.fa -1 read1.fq.gz -2 read2.fq.gz -o aln.bed     
  # ATAC-seq reads
chromap --preset atac -x index -r ref.fa -1 read1.fq.gz -2 read2.fq.gz -o aln.bed     
  # scATAC-seq reads
chromap --preset atac -x index -r ref.fa -1 read1.fq.gz -2 read2.fq.gz -o aln.bed/
 -b barcode.fq.gz --barcode-whitelist whitelist.txt                                  
atac process
  • preset 模式到atac,基本的处理过程是trim3'端的接头,比对,细胞水平去重、做ATAC的 peak shift,然后根据提供的barcode 白名单进行barcode矫正。
image.png
  • --read-format指定barcode 在fastq的位置。

版权声明:
作者:siwei
链接:https://www.techfm.club/p/87268.html
来源:TechFM
文章版权归作者所有,未经允许请勿转载。

THE END
分享
二维码
< <上一篇
下一篇>>