Basic Introduction

Description

rgt_logo

Regulatory Genomics Toolbox (RGT) is an open source python library for analysis of regulatory genomics. RGT is programmed in an oriented object fashion and its core classes provide functionality for handling regulatory genomics data. This library has been used for implementation of several tools as ChIP-Seq differential peak callers (ODIN and THOR), DNase-Seq footprinting method (HINT) and the visualization tool RGT-Viz.

 

 

Installation

The recommended way to get RGT is via pip:

pip install --user cython numpy scipy
pip install --user RGT

This will install the full RGT suite and all dependencies. Alternative installation instructions, for example to only install some of the tools, are found here.

After installing, make sure to follow the instructions for setting up your RGT data folder. This step is required.

RGT Main Functionalities

Figure 1 shows a typical pipeline for analysis of histone modifications. First, ChIP-Seq reads are mapped to the genome with a general aligner. Then, a peak caller is used to find regions with locations of the histone modifications. Peak callers usually use/provide two types of representations: genomic signals, which reflect the number of ChIP-Seq reads mapped to particular genomic positions, and genomic regions, which represent the regions where the histone modifications are located.

RGT_plots.001

The example above requires two main data types. First, we have the set of genomic regions (or genomic ranges), which can be used to represent sets of transcription factor binding sites, ChIP-Seq peaks and single nucleotide polymorphisms (SNPs) (Figure 2 A). RGT also implements genomic region set operations (intersection, union and difference, among others), which are commonly required for integrative analysis of regulatory genomics data. Another important data structure is coverage set, which store genomic signals of ChIP-Seq or any targeted sequencing technique (Figure 2 B). Coverage set can be seen as a compression of read alignment results and is used for medium level analysis methods. Coverage set classes provide functions for pre-processing of genomic signals adopted by our tools ODIN, THOR and HINT, i.e. fragment extension estimation, smoothing, CG content bias correction, normalization, among others. Detailed class documentations can be found here.

Figure_Classes.001

Moreover, RGT provides input and output functions of typical genomic format files such as read alignments (BAM files), genomic profiles (wig/bigWig files) and genomic regions (bed, vcf files). Finally, RGT include classes for handling genome annotations, such as transcript and gene from standard formats (gtf files) and motif databases (transfac format). These core classes provide a powerful infrastructure for development of methods for regulatory genomics. Check out the documentation for more details of classes and supported methods.

 

Example

RGT currently provides functionality for all steps of basic analysis pipelines of ChIP-Seq and DNase-Seq data. Check our tutorial describing a complete pipeline for analysis of TF ChIP-Seq data including the implementation of your own peak caller.