www.epigenomes.ca tracks

The tracks available in this set have been generated by the Centre for Epigenome Mapping Technology (CEMT) at Canada's Michael Smith Genome Sciences Centre (BCGSC) as a part of the contribution of Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC) to the International Human Epigenome Consortium (IHEC).

The data tracks represent raw signal generated from aligned reads. Access to the raw data underlying the tracks is controlled via European Bioinformatics Institute. The data will be submitted to CEMT Reference Epigenomes (Study: EGAS00001000552) on quarterly release cycle. More information about the project is available at www.epigenomes.ca.

Sample metadata is available through reference epigomes table. To request access please see data access agreement page

The wet lab protocols used are described at protocols.

Data analysis:RNA-Seq

The protocol for strand specific mRNA-seq assays was paired end. The sequenced reads were aligned to a genome + transcriptome reference (see JAGuaR: Repositioning of RNA-seq Reads) using BWA version 0.7.6a. The resulting bam files were repositioned to GRCh38_no_alt using JAGuaR (version 2.2.2). The bams were annotated using in-house tools. Duplicates were marked using Picard Tools MarkDuplicates.jar.

Using in-house tools the bam was split by strand of originally sequenced cDNA fragment and wig files were generated with reads were filtered by SAMtools flags "-F 516 -q 0". An in-house RNA QC and Analysis pipeline was used to generate a report containing a normalization constant for computing rpkm values. The constant was inferred from the total number of exonic reads (excluding mitchochondrial reads, reads from ribosomal genes, or reads from highest 0.5% expressed exons). The signal values from the wig files were scaled to obtain rpkm normalized tracks. The wig files were converted to bigwigs using UCSC tools.

The in house analysis is described in RNA-Sequencing section of Gascard et al. "Epigenetic and transcriptional determinants of the human breast". The following processed tracks are generated:

Track Type	Description
rpkm_forward	Bigwig track for forward RPKM.
rpkm_reverse	Bigwig track for reverse RPKM.
signal_forward	Bigwig track for forward raw coverage.
signal_reverse	Bigwig track for reverse raw coverage.

Data analysis:miRNA-Seq

The protocol for miRNA-Seq assays was single end. The raw sequence reads were split by index and adaptors were trimmed. Then fastq file for each index were aligned to GRCh38_no_alt reference using BWA version 0.7.13. Duplicates were marked using Picard Tools' MarkDuplicates.jar.

The resulting bam files were analyzed using a miRNA pipeline that consists of BCGSC's in-house miRNA profiling (version 0.2.6), SAMtools and miRBase (version 21). miRDeep was used for novel miRNA prediction. The isoform bed files from the pipeline were split into mature, precursor and everthing not classified as mature or precusor ("miRNA not mature or precursor") and saved into separate bedgraphs with the reads_per_million_miRNA_mapped column used as the signal value. The bedgraphs were converted to bigwigs using UCSC tools.

Track Type	Description
isoform_details	Bigbed track with isoform details for miRNA discovered.
signal_unstranded	Bigwig track for raw coverage.
reads_per_million_miRNA_mapped"	Normalized bigwig track with "reads_per_million_miRNA_mapped" values.

Data analysis:ChIP-Seq

The protocol for ChIP-Seq assays was paired end. The sequenced reads were aligned to GRCh38_no_alt reference using Burrows-Wheeler Aligner version 0.7.6a (mem mode) and converted to bam format with SAMtools. Duplicates were marked using Sambamba version 5.5.

Wig tracks were generated from bam through in-house tools using ChIP-Seq PET mode with SAMtools flags "-F 3332 -q 5". The wig files were converted to bigwigs using UCSC tools.

Peak calling for each assay is done with FindER version 1.0.1e using the following comamd line:

java -jar -Xmx30G FindER.1.0.1e.jar -signalBam $treatmentBam -inputBam $controlBam -out $out

Two post processed tracks are generated:

Track Type	Description
peaks	Enrichment calls are formatted as a bigBed file.
peaks_bw	Enrichment calls are formatted as a bigWig file where enriched regions are assigned a score of 1 and 0 otherwise. This is done for compact visualization over long genomic stretches when compared to "peaks" track type.

Data analysis:Bisulfite-Seq

Please note the following only applies to CEMT sample; for REMC samples see section "Methylation data cross-assay standardization and uniform processing for consolidated epigenomes" of Roadmap Epigenomics Consortium - Integrative analysis of 111 reference human epigenomes.

The protocol for WGBS assays was paired end. The data from each lanes of sequencing was aligned, and then merged. Alignment was done to GRCh38_no_alt reference using Novoalign (version 3.04.06)). The bams were annotated using in-house tools and the duplicates were marked using Picard Tools' MarkDuplicates.jar. Novoalign, Novomethyl, and other in-house tools were used to generate fractional calls and coverage analysis. The wigs files were converted to bigwigs using UCSC tools.

Two post processed tracks are generated:

Track Type	Description
signal	The bigwig track with raw coverage signal.
methylation_profile	The bigwig fractional methylation calls on scale of 1 to 10.

Please direct any questions to: edcc@bcgsc.ca