Tool Usage

HINT can be executed with the following command:

rgt-hint [options] <experiment_matrix>

Where:
<experiment_matrix>: Required input for the program. It describes the input regions and aligned reads.
[options]: Additional input files, paths, parameters or output options.

 

Footprints Detection

Required Input

Option Name Type Default Description
<experiment_matrix> File None The experiment matrix should contain one regions in which the footprinting will be performed, one DNase-seq BAM files and one or more histone modifications BAM file(s) per group. Check more details on RGT experiment matrix format.

Options

Option Name Type Default Description
–hmm-file FILE_1_1[[,…,FILE_N_1];…;FILE_1_M[,…,FILE_N_M]] Default HMM List of HMM files separated by comma. If one file only, then this HMM will be applied for all histone signals, otherwise, the list must have the same number of histone files given. The order of the list should be the order of the histones in the input_matrix file. If the argument is not given, then a default HMM will be used. In case multiple input groups are used, then other lists can be passed using semicolon. The number of group of lists should equals the number of input groups.
–bias-table FILE1_F,FILE1_R[;…;FILEM_F,FILEM_R] Default bias tables List of files (for each input group; separated by semicolon) with all possible k-mers (for any k) and their bias estimates. Each input group should have two files: one for the forward and one for the negative strand. Each line should contain a kmer and the bias estimate separated by tab. Leave an empty set for histone-only groups. Eg. FILE1;;FILE3.
–organism String hg19 Describes the organism in which the analysis is being performed. All default files such as genomes will be based on the chosen organism and the data.config file. Check more information on the rgtdata and data.config file. This option is used only if a bigbed output is asked.
–estimate-bias-correction Boolean False Applies DNase-seq cleavage bias correction with k-mer bias estimated from the given DNase-seq data (SLOW HINT-BC).
–default-bias-correction Boolean False Applies DNase-seq cleavage bias correction with default k-mer bias estimates (FAST HINT-BC).
–output-location Path <input_path> Path where the output files will be written.
–footprint-name String footprints Name of the footprint (result) file (without extension).
–print-bb Boolean False If used, the output (footprints) will be a bigbed (.bb) file.

Special Input File Formats

Experiment Matrix

The experiment matrix should contain one region (BED file) in which the footprinting will be performed, one DNase-seq BAM file and one or more histone modifications BAM file(s) per group. Each group will be defined by the last column (“group”) of the experimental matrix. Bellow there is an example of standard experiment matrix to find footprints inside hypersensitivity regions (HS) given: (analysis 1) DNase, H3K4me1 and H3K4me3 aligned read (BAM) files and (analysis 2) using DNase-seq only in the same set of regions. It is important also to set the “data” column of the experimental matrix to one of the following: “HS” (for the regions in which the footprinting will be performed), “DNASE” (for DNase-seq data) or “HISTONE” (for histone modification data)

name    type    file                       data    group
HS1     regions ./Input/regions.bed        HS      FP1
DNase   reads   ./Input/DNase_chr22.bam    DNASE   FP1
H3K4me1 reads   ./Input/H3K4me1_chr22.bam  HISTONE FP1
H3K4me3 reads   ./Input/H3K4me3_chr22.bam  HISTONE FP1
HS2     regions ./Input/regions.bed        HS      FP2
DNase   reads   ./Input/DNase_chr22.bam    DNASE   FP2

Learn more about Experiment Matrix Format.

HMM File

Each HMM file (with extension .hmm) describes a 4-dimensional HMM containing data regarding the normalized and slope signals of the DNase and histone modification signals.

A valid HMM file follows these rules:

  • The first line describes the number of states.
  • The next two lines describe the initial probabilities. It consists of a space-separated list of numbers ordered by the HMM states (see order below).
  • The following lines starting at the line containing “transitions” until the line containing “emissions” contain the transition matrix. Each line contains the probability of going from the state number represented by that line (first matrix line = first state, etc.) (see order below) to the state number represented by the column number (given a space-separated list of numbers).
  • The following lines starting at the line containing “emissions” until the end of the file contain the emission probabilities. Each line represents the emissions of a particular state given the order of states (see below). Within a line, the numbers before the # symbol represent the signal distribution’s means in a certain order of signals (see order below). The numbers after the # symbol represent a vectorized form (by row) of the covariance matrix given all the signals. Rows and columns of the covariance matrix also follow the order of the signals presented below.
  • The order of the states in the file above is always:
    • DH-HMM: BACKGROUND – UP(H) – TOP(H) – DOWN(H) – UP(D) – TOP(D) – DOWN(D) – FOOTPRINT
  • The order of the signals in the file above is always:
    • DH-HMM: DNase normalized – DNase slope – Histone normalized – Histone slope.

An example of HMM file trained using DNase+H3K4me3 in cell type K562 can be seen below:


states 8
initial
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
transitions
0.9993 0.0007 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.99 0.01 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.9873 0.0127 0.0 0.0 0.0 0.0
0.0042 0.0 0.0 0.9839 0.0119 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.6661 0.3339 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.8089 0.1911 0.0
0.0 0.0736 0.0 0.0 0.0 0.0 0.6143 0.3121
0.0 0.0 0.0 0.0 0.0758 0.0 0.0 0.9242
emissions
0.0053 -0.0024 0.0378 0.0008 # 0.0004 -0.0002 0.0001 -0.0 -0.0002 0.0079 0.0 0.0001 0.0001 0.0 0.001 0.0001 -0.0 0.0001 0.0001 0.0075
0.028 -0.0113 0.1837 0.3118 # 0.0031 -0.0005 0.0007 0.0017 -0.0005 0.019 -0.0005 0.0003 0.0007 -0.0005 0.0157 0.0153 0.0017 0.0003 0.0153 0.0307
0.0301 -0.0026 0.3409 -0.0389 # 0.0035 0.0 0.0006 0.0003 0.0 0.0259 -0.0004 0.0018 0.0006 -0.0004 0.0268 -0.004 0.0003 0.0018 -0.004 0.0903
0.034 0.007 0.1944 -0.3959 # 0.0029 0.0002 0.0006 -0.0027 0.0002 0.0228 0.0001 0.0001 0.0006 0.0001 0.0097 -0.0176 -0.0027 0.0001 -0.0176 0.0673
0.2143 0.804 0.0789 -0.1053 # 0.065 0.0208 0.0015 -0.0007 0.0208 0.0609 0.0063 -0.009 0.0015 0.0063 0.0046 -0.0033 -0.0007 -0.009 -0.0033 0.0421
0.4402 0.0308 0.0655 -0.0238 # 0.1247 0.0083 0.0045 -0.0108 0.0083 0.2008 0.0007 0.0004 0.0045 0.0007 0.0031 -0.0029 -0.0108 0.0004 -0.0029 0.0479
0.2495 -0.8282 0.0814 -0.051 # 0.0884 -0.0229 0.0014 -0.0022 -0.0229 0.046 -0.0058 0.0089 0.0014 -0.0058 0.005 -0.0003 -0.0022 0.0089 -0.0003 0.0514
0.1422 -0.0415 0.08 -0.0521 # 0.0381 -0.0019 0.0015 -0.0011 -0.0019 0.1571 0.0011 0.003 0.0015 0.0011 0.0055 0.0001 -0.0011 0.003 0.0001 0.0401

 

Output

HINT outputs a bed file (or bigbed file, if requested by the user) containing all the footprints found by HINT within the regions queried.