Basic Introduction
Method
Installation
If you have followed the generic instructions for the RGT suite installation, then you can start using HINT. If you haven’t install RGT yet, use the command
pip install --user RGT
to install HINT. Further installation instructions are found here. If you have any questions, comments, installation problems, or bug reports, please contact us.
Note: You need to follow these instructions to download the genomic data for cleavage bias correction.
Basic Usage
We describe here how to detect footprints using HINT for ATAC-seq, DNase-seq, and histone modifications data. To perform footprinting, you need at least two files, one with the aligned reads of your chromatin data and another describing the regions to detect footprints. You can use a peak caller, such as MACS2, to define these regions of interest.
Footprinting for ATAC-seq data
Download here the example data for ATAC-seq based on chromosome 1 of the GM12878 cell. Execute the following commands to extract the data from the download file:
tar xvfz HINT_ATACTest.tar.gz cd HINT_ATACTest
and the below command to perform footprinting:
rgt-hint footprinting --atac-seq ATAC.bam ATACPeaks.bed
For simplicity, we use only the first 1000 peaks from chromosome 1. The above commands will output a BED file containing the footprints in your current folder with footprints as the prefix. Moreover, You can set the below arguments
--output-location=your_directory --output-prefix=your_prefix
to tell HINT your preferred output directory and name. Each footprint, i.e. each line of the BED file, will contain information regarding the tag-count score (number of reads) of each footprint. This score can be used as a footprint quality assessment (the higher values indicates better candidates). In addition, a file including the details of reads and footprints will also be written in the same folder of BED file.
If your data is paired-end, you may want to try another model which is optimized for paired-end sequencing data:
rgt-hint footprinting --atac-seq --paired-end --output-prefix=fp_paired ATAC.bam ATACPeaks.bed
Note: HINT performs bias correction for ATAC-seq by default, so you must download the genomes following these instructions and correctly specify the genome references with the following command before footprinting:
--organism=genome_version
Currently, the default setting is hg19. Find here for more information.
Footprinting for DNase-seq
You can find here example DNase-seq data. Execute the following commands to extract the data from a compressed file:
tar xvfz HINT_DNaseTest.tar.gz cd HINT_DNaseTest
and the following command to call the footprints:
rgt-hint footprinting --dnase-seq DNase.bam DNasePeaks.bed
We recommend you to use cleavage bias correction. This can be done by using the following command:
rgt-hint footprinting --dnase-seq --bias-correction DNase.bam DNasePeaks.bed
Don’t forget to define the proper genome references using :
--organism=genome_version
Currently, the default setting is hg19.
Footprinting for histone modification data
Download here the example data for histone modification. Execute the following commands to extract data:
tar xvfz HINT_HistoneTest.tar.gz cd HINT_HistoneTest
and call footprints
rgt-hint footprinting --histone histone.bam histonePeaks.bed
The complete tutorial and more descriptive examples are found in here.
Citation
If you use HINT with DNAse or histones cite the following publication:
Gusmao EG, Dieterich C, Zenke M and Costa IG. “Detection of active transcription factor binding sites with the combination of DNase hypersensitivity and histone modifications” Bioinformatics, 30(22):3143-3151, 2014. [Full Text]
HINT with DNase with bias correction should cite
Gusmao EG, Allhoff M, Zenke M and Costa IG. “Analysis of computational footprinting methods for DNase sequencing experiments”. Nature Methods, 13(4):303-309, 2016.[Full Text]
HINT with ATAC-seq should cite the following publication
Li, Z., Schulz, M. H., Look, T., Begemann, M., Zenke, M., & Costa, I. G. (2019). Identification of transcription factor binding sites using ATAC-seq. Genome Biology, 20(1), 45.[Full Text]