Canadian Epigenetics, Environment and Health Research Consortium

Description

The tracks available in this set have been generated by the Centre for Epigenome Mapping Technology (CEMT) at Canada's Michael Smith Genome Sciences Centre (BCGSC) as a part of the contribution of Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC) to the International Human Epigenome Consortium (IHEC).

The data tracks represent raw signal generated from aligned reads. Access to the raw data underlying the tracks is controlled via European Bioinformatics Institute. The data will be submitted to CEMT Reference Epigenomes (Study: EGAS00001000552) on quarterly release cycle. More information about the project is available at www.epigenomes.ca.

Sample metadata is available through reference epigomes table. To request access please see data access agreement page

Methods

The wet lab protocols used are described at protocols.

Data analysis:RNA-Seq

The protocol for strand specific mRNA-seq assays was paired end. The sequenced reads were aligned to a genome + transcriptome reference (see JAGuaR: Repositioning of RNA-seq Reads) using BWA version 0.7.6a. The resulting bam files were repositioned to GRCh38_no_alt using JAGuaR (version 2.2.2). The bams were annotated using in-house tools. Duplicates were marked using Picard Tools MarkDuplicates.jar.

Using in-house tools the bam was split by strand of originally sequenced cDNA fragment and wig files were generated with reads were filtered by SAMtools flags "-F 516 -q 0". An in-house RNA QC and Analysis pipeline was used to generate a report containing a normalization constant for computing rpkm values. The constant was inferred from the total number of exonic reads (excluding mitchochondrial reads, reads from ribosomal genes, or reads from highest 0.5% expressed exons). The signal values from the wig files were scaled to obtain rpkm normalized tracks. The wig files were converted to bigwigs using UCSC tools.

The in house analysis is described in RNA-Sequencing section of Gascard et al. "Epigenetic and transcriptional determinants of the human breast". The following processed tracks are generated:

Track Type Description
rpkm_forward Bigwig track for forward RPKM.
rpkm_reverse Bigwig track for reverse RPKM.
signal_forward Bigwig track for forward raw coverage.
signal_reverse Bigwig track for reverse raw coverage.

Data analysis:miRNA-Seq

The protocol for miRNA-Seq assays was single end. The raw sequence reads were split by index and adaptors were trimmed. Then fastq file for each index were aligned to GRCh38_no_alt reference using BWA version 0.7.13. Duplicates were marked using Picard Tools' MarkDuplicates.jar.

The resulting bam files were analyzed using a miRNA pipeline that consists of BCGSC's in-house miRNA profiling (version 0.2.6), SAMtools and miRBase (version 21). miRDeep was used for novel miRNA prediction. The isoform bed files from the pipeline were split into mature, precursor and everthing not classified as mature or precusor ("miRNA not mature or precursor") and saved into separate bedgraphs with the reads_per_million_miRNA_mapped column used as the signal value. The bedgraphs were converted to bigwigs using UCSC tools.

Track Type Description
isoform_details Bigbed track with isoform details for miRNA discovered.
signal_unstranded Bigwig track for raw coverage.
reads_per_million_miRNA_mapped" Normalized bigwig track with "reads_per_million_miRNA_mapped" values.

Data analysis:ChIP-Seq

The protocol for ChIP-Seq assays was paired end. The sequenced reads were aligned to GRCh38_no_alt reference using Burrows-Wheeler Aligner version 0.7.6a (mem mode) and converted to bam format with SAMtools. Duplicates were marked using Sambamba version 5.5.

Wig tracks were generated from bam through in-house tools using ChIP-Seq PET mode with SAMtools flags "-F 3332 -q 5". The wig files were converted to bigwigs using UCSC tools.

Peak calling for each assay is done with FindER version 1.0.1e using the following comamd line:
java -jar -Xmx30G FindER.1.0.1e.jar -signalBam $treatmentBam -inputBam $controlBam -out $out
Two post processed tracks are generated:

Track Type Description
peaks Enrichment calls are formatted as a bigBed file.
peaks_bw Enrichment calls are formatted as a bigWig file where enriched regions are assigned a score of 1 and 0 otherwise. This is done for compact visualization over long genomic stretches when compared to "peaks" track type.

Data analysis:Bisulfite-Seq

Please note the following only applies to CEMT sample; for REMC samples see section "Methylation data cross-assay standardization and uniform processing for consolidated epigenomes" of Roadmap Epigenomics Consortium - Integrative analysis of 111 reference human epigenomes.

The protocol for WGBS assays was paired end. The data from each lanes of sequencing was aligned, and then merged. Alignment was done to GRCh38_no_alt reference using Novoalign (version 3.04.06)). The bams were annotated using in-house tools and the duplicates were marked using Picard Tools' MarkDuplicates.jar. Novoalign, Novomethyl, and other in-house tools were used to generate fractional calls and coverage analysis. The wigs files were converted to bigwigs using UCSC tools.

Two post processed tracks are generated:

Track Type Description
signal The bigwig track with raw coverage signal.
methylation_profile The bigwig fractional methylation calls on scale of 1 to 10.

Contacts

Please direct any questions to: edcc@bcgsc.ca