Tutorial – Differential Footprints on single cell ATAC-seq

Analysis of single-cell chromatin accessibility data allows us to detect novel cell clusters, while how to interpret these clusters remains a challenge. In this tutorial, we will show how to use scHINT, an extended version of HINT-ATAC, to compare transcription factor binding activity based on differential footprinting. We will use a public scATAC-seq dataset from human hematopoietic cell. We will also provide soon a tutorial for the analysis of kidney fibrosis data with Unilateral Ureter Obstruction (UUO) mouse models.

We have performed clustering and observed that the MEP progenitors from two subpopulations of cells. To characterize the transcription factors that control these tow sub-clusters, we merged the cells from each of the sub-clusters and created a cluster-specific ATAC-seq library, named as MEP1 and MEP2.  The other cells were collected as Others.

1. Download the data here and extract the pre-processed data with the following commands:

tar xvfz HematopoieticCells.tar.gz
cd HematopoieticCells

2. Execute the following commands to call footprints for MEP1, MEP2 and Others:

mkdir Footprints

rgt-hint footprinting --atac-seq --paired-end --organism=hg19 --output-location=./Footprints --output-prefix=MEP1 ./BamFiles/MEP1.bam ./Peaks/MEP1_peaks.narrowPeak

rgt-hint footprinting --atac-seq --paired-end --organism=hg19 --output-location=./Footprints --output-prefix=MEP2 ./BamFiles/MEP2.bam ./Peaks/MEP2_peaks.narrowPeak

rgt-hint footprinting --atac-seq --paired-end --organism=hg19 --output-location=./Footprints --output-prefix=Others ./BamFiles/Others.bam ./Peaks/Others_peaks.narrowPeak

HINT requires a bam file (indexed and sorted) and peaks as input. HINT only considers peak regions for footprinting to speed up the analysis. Each command will generate an output file in the current folder, containing the genomic locations of the footprints. HINT-ATAC (MEP1.bed, MEP2.bed and Others.bed for the example above). It also produces a file with ending “.info”, which has statistics from the libraries, i.e. the total number of reads and so on.

When –paired-end is specified, HINT-ATAC filter signals by only considering cleavage events from paired-ends with a particular size range.  This option is recommended given a significant improvement in footprint prediction (see our paper).   You also need to specify the genome/organism with the flag –organism. See this for supported genomes or perform the command bellow for a complete description of HINT parameters:

rgt-hint footprinting --help

3. One of the main applications of footprinting is to find TFs associated with a particular cellular condition. We can do this by first finding motifs overlapping with predicted footprints. RGT suite also offers a tool for finding motif predicted binding sites (MPBS) with JASPAR, UNIPROBE or HOCOMOCO motifs.

Execute the following commands to do motif matching for footprints using motifs from JASPAR as default:

mkdir MotifMatching

rgt-motifanalysis matching --organism=hg19 --output-location=./MotifMatching --input-files ./Footprints/MEP1.bed ./Footprints/MEP2.bed ./Footprints/Others.bed

whereMEP1.bed, MEP2.bed and Others.bed are the footprint predictions obtained in step 2. The above commands will generate a BED file for each input file, containing the matched motif instances for each footprint region. The 4th column contains the motif name and the 5th column the bit-score of the motif match. Check here for more details about motif matching.

4. Finally, we use HINT to generate average ATAC-seq profiles around binding sites of particular TF. This analysis allows us to inspect the chromatin accessibility for each particular TF. Moreover, by comparing the cleavage profiles from different ATAC-seq libraries, we can get insights on changes in binding in two cells. For this, execute the following commands:

rgt-hint differential --organism=hg19 --bc --nc 30 --mpbs-files=./MotifMatching/MEP1_mpbs.bed,./MotifMatching/MEP1_mpbs.bed,./MotifMatching/Others_mpbs.bed --reads-files=./BamFiles/MEP1.bam,./BamFiles/MEP2.bam,./BamFiles/Others.bam --conditions=MEP1,MEP2,Others --output-location=MEP1_MEP2_Others

The above command will read the motif matching files generated by step 3 and BAM files which contain the sequencing reads to perform the comparison. Note that here we specify –bc to use the bias-corrected signal (currently only  ATAC-seq is supported). The command –nc allow parallel execution of the job.

After the command is done, a new folder ‘MEP1_MEP2_Others’ will be created. Unlike the comparison between two conditions, we here can not directly visualize the factor activity differences using a scatter plot. Therefore, we only create a folder called Lineplots to include the ATAC-seq profile for each of the motifs across different clusters, as shown below.