Description of Experimental Matrix
The experiment matrix is a basic input for many RGT Tools. It consists of a tab-separated plain text file where you can define:
- Genomic Regions: Particular genomic regions of interest. Usually represented by BED files.
- Aligned Reads: Derived from sequencing methods (such as RNA-seq and ChIP-seq). Usually represented by BAM files.
- Gene Sets: Lists of genes of interest.
The header of such file (first line) contains the definition of each column. It has three mandatory elements (name, type and file) followed by any number of additional items that may depend on the tool being used (or sometimes, used only for experiment clarity by the user). Although the three mandatory elements must have these fixed names, the additional fields may have any name.
The three mandatory elements represent:
- name: Unique name for each subsequent file.
- type: File type. Can be “regions” for genomic regions, “reads” for aligned reads or “genes” for gene sets.
- file: The path to the file (relative or absolute).
After the header line, each line will be interpreted as a file entry. Each column must have an information matching the header’s description. Most tools use these file formats:
- regions: BED format.
- reads: BAM format.
- genes: Plain text file containing one gene name per line. The gene names must match the gene symbols as can be obtained here.
Example One – Experimental Matrix from RGT Tutorial
Below we show the example of the experimental matrix used in the RGT Tutorial (from tool RGT-viz):
# Header below with two additional fields name type file cell factor # First section -> CDP data CDP_PU1 regions ./data/PU1_CDP_500.bed CDP PU1 PU1_CDP_WT reads ./data/CDP_PU1.bw CDP PU.1 H3K4me1_CDP reads ./data/CDP_WT_H3K4me1.bw CDP H3K4me1 H3K4me3_CDP_WT reads ./data/CDP_WT_H3K4me3.bw CDP H3K4me3 H3K27me3_CDP_WT reads ./data/CDP_WT_H3K27me3.bw CDP H3K27me3 # Second section -> cDC data cDC_PU1 regions ./data/PU1_cDC_500.bed cDC PU1 PU1_cDC_WT reads ./data/cDC_PU1.bw cDC PU.1 H3K4me1_cDC reads ./data/cDC_WT_H3K4me1.bw cDC H3K4me1 H3K4me3_cDC_WT reads ./data/cDC_WT_H3K4me3.bw cDC H3K4me3 H3K27me3_cDC_WT reads ./data/cDC_WT_H3K27me3.bw cDC H3K27me3
All lines starting with ‘#’ represent comments and are not considered by the experimental matrix parser. The above experimental matrix is divided into two sections:
- First section: Contains the PU.1 regions (BED file) from cell type CDP in the first line below the section comment. The following lines represent ChIP-seq data (signal – BW file) from PU.1 and histone modifications on cell type CDP.
- Second section: Contains the PU.1 regions (BED file) from cell type cDC in the first line below the section comment. The following lines represent ChIP-seq data (signal – BW file) from PU.1 and histone modifications on cell type cDC.
Below, we show another example of experiment matrix:
# Header below with two additional fields name type file cell factor # Regions K_GABP regions K562/gabp_peaks.bed K562 GABP K_GATA2 regions K562/gata2_peaks.bed K562 GATA2 K_MYC regions K562/myc_peaks.bed K562 MYC H_GABP regions ESC/gabp_peaks.bed H1-hESC GABP H_GATA2 regions ESC/gata2_peaks.bed H1-hESC GATA2 H_MYC regions ESC/myc_peaks.bed H1-hESC MYC # Reads K_DNASE reads K562/DNase.bam K562 DNase K_H3K4ME1 reads K562/H3K4me1.bam K562 H3K4me1 K_H3K4ME3 reads K562/H3K4me3.bam K562 H3K4me3 H_DNASE reads ESC/DNase.bam K562 DNase H_H3K4ME1 reads ESC/H3K4me1.bam K562 H3K4me1 H_H3K4ME3 reads ESC/H3K4me3.bam K562 H3K4me3 # Genes K_UP_REG genes K562/up_reg.txt K562 up_reg K_DW_REG genes K562/down_reg.txt K562 dw_reg H_UP_REG genes ESC/up_reg.txt K562 up_reg H_DW_REG genes ESC/down_reg.txt K562 dw_reg
In the experiment matrix above we display data regarding two cell types: K562 and H1-hESC. We show regions enriched with the transcription factors GABP, GATA2 and MYC. Also, we provide aligned reads for DNase-seq and ChIP-seq of the histone modifications H3K4me1 and H3K4me3. Finally, we exhibit lists of genes which are up- and down-regulated in these two cell types.
Experimental matrix organization and possible input types may change between different RGT tools. Please check the tool’s manual for more details.