This workshop will guide you through the basics of ChIP-seq analysis with hands-on exercises. You will learn how to process ChIP-seq data: perform read alignment, peak calling, quality control, visualization through the genome browser, motif finding and gene set enrichment analysis.

Note: All the code in this tutorial will be performed in the terminal.

01 - UCSC genome browser

Upload of peak files

One of the most common useful ways to visualize our data is through the UCSC genome browser. We can visualize the peaks that we’ve generated with macs and see directly the regions of the genome where our protein of interest is binding.

Let’s move to the scripts directory (or stay here if already there):

cd chip_seq/scripts

To do this, we need to upload the .bed files that we got from macs2

head ../results/macs2/HIF1a_rep1_peaks.bed

chr6    41734187    41734560    HIF1a_rep1_peak_1   181 .
chr6    41787043    41787581    HIF1a_rep1_peak_2   160 .
chr6    42452936    42453274    HIF1a_rep1_peak_3   194 .
chr6    42929347    42929669    HIF1a_rep1_peak_4   135 .
chr6    43013743    43014397    HIF1a_rep1_peak_5   100 .
chr6    43181975    43182279    HIF1a_rep1_peak_6   109 .
chr6    43667941    43668609    HIF1a_rep1_peak_7   4365    .
chr6    43725079    43725412    HIF1a_rep1_peak_8   102 .
chr6    43768993    43770080    HIF1a_rep1_peak_9   968 .
chr6    43798896    43799239    HIF1a_rep1_peak_10  322 .

In the genome browser, visualize the coordinates of a given peak in your .bed file.

Remember that in the UCSC genome browser you have to specify the positions in this format: chr:start-end.

Generation of bigwig

One of the most common/useful formats to visualize the data is the bigwig format. To generate this, we are going to use deeptools.deepTools is a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq. It offers a lot of tools for processing BAM and bigwig files.

We are going to convert our .BAM file ( from our alignment step) into a .bigwig file

ls ../results/HIF1a_Rep1.sorted.rmdup.bam

../results/HIF1a_Rep1.sorted.rmdup.bam

To convert, we are going to use the bamCoverage command

bamCoverage --help

usage: bamCoverage -b reads.bam -o coverage.bw
help: bamCoverage -h / bamCoverage --help

This tool takes an alignment of reads or fragments as input (BAM file) and
generates a coverage track (bigWig or bedGraph) as output. The coverage is
calculated as the number of reads per bin, where bins are short consecutive
counting windows of a defined size. It is possible to extended the length of
the reads to better reflect the actual fragment length. *bamCoverage* offers
normalization by scaling factor, Reads Per Kilobase per Million mapped reads
(RPKM), counts per million (CPM), bins per million mapped reads (BPM) and 1x
depth (reads per genome coverage, RPGC).

Required arguments:
  --bam BAM file, -b BAM file
                        BAM file to process (default: None)

Output:
  --outFileName FILENAME, -o FILENAME
                        Output file name. (default: None)
  --outFileFormat {bigwig,bedgraph}, -of {bigwig,bedgraph}
                        Output file type. Either "bigwig" or "bedgraph".
                        (default: bigwig)

Optional arguments:
  --help, -h            show this help message and exit
  --scaleFactor SCALEFACTOR
                        The computed scaling factor (or 1, if not applicable)
                        will be multiplied by this. (Default: 1.0)
  --MNase               Determine nucleosome positions from MNase-seq data.
                        Only 3 nucleotides at the center of each fragment are
                        counted. The fragment ends are defined by the two mate
                        reads. Only fragment lengthsbetween 130 - 200 bp are
                        considered to avoid dinucleosomes or other artifacts.
                        By default, any fragments smaller or larger than this
                        are ignored. To over-ride this, use the
                        --minFragmentLength and --maxFragmentLength options,
                        which will default to 130 and 200 if not otherwise
                        specified in the presence of --MNase. *NOTE*: Requires
                        paired-end data. A bin size of 1 is recommended.
                        (default: False)
  --Offset INT [INT ...]
                        Uses this offset inside of each read as the signal.
                        This is useful in cases like RiboSeq or GROseq, where
                        the signal is 12, 15 or 0 bases past the start of the
                        read. This can be paired with the --filterRNAstrand
                        option. Note that negative values indicate offsets
                        from the end of each read. A value of 1 indicates the
                        first base of the alignment (taking alignment
                        orientation into account). Likewise, a value of -1 is
                        the last base of the alignment. An offset of 0 is not
                        permitted. If two values are specified, then they will
                        be used to specify a range of positions. Note that
                        specifying something like --Offset 5 -1 will result in
                        the 5th through last position being used, which is
                        equivalent to trimming 4 bases from the 5-prime end of
                        alignments. Note that if you specify --centerReads,
                        the centering will be performed before the offset.
                        (default: None)
  --filterRNAstrand {forward,reverse}
                        Selects RNA-seq reads (single-end or paired-end)
                        originating from genes on the given strand. This
                        option assumes a standard dUTP-based library
                        preparation (that is, --filterRNAstrand=forward keeps
                        minus-strand reads, which originally came from genes
                        on the forward strand using a dUTP-based method).
                        Consider using --samExcludeFlag instead for filtering
                        by strand in other contexts. (default: None)
  --version             show program's version number and exit
  --binSize INT bp, -bs INT bp
                        Size of the bins, in bases, for the output of the
                        bigwig/bedgraph file. (Default: 50)
  --region CHR:START:END, -r CHR:START:END
                        Region of the genome to limit the operation to - this
                        is useful when testing parameters to reduce the
                        computing time. The format is chr:start:end, for
                        example --region chr10 or --region
                        chr10:456700:891000. (default: None)
  --blackListFileName BED file [BED file ...], -bl BED file [BED file ...]
                        A BED or GTF file containing regions that should be
                        excluded from all analyses. Currently this works by
                        rejecting genomic chunks that happen to overlap an
                        entry. Consequently, for BAM files, if a read
                        partially overlaps a blacklisted region or a fragment
                        spans over it, then the read/fragment might still be
                        considered. Please note that you should adjust the
                        effective genome size, if relevant. (default: None)
  --numberOfProcessors INT, -p INT
                        Number of processors to use. Type "max/2" to use half
                        the maximum number of processors or "max" to use all
                        available processors. (Default: 1)
  --verbose, -v         Set to see processing messages. (default: False)

Read coverage normalization options:
  --effectiveGenomeSize EFFECTIVEGENOMESIZE
                        The effective genome size is the portion of the genome
                        that is mappable. Large fractions of the genome are
                        stretches of NNNN that should be discarded. Also, if
                        repetitive regions were not included in the mapping of
                        reads, the effective genome size needs to be adjusted
                        accordingly. A table of values is available here: http
                        ://deeptools.readthedocs.io/en/latest/content/feature/
                        effectiveGenomeSize.html . (default: None)
  --normalizeUsing {RPKM,CPM,BPM,RPGC,None}
                        Use one of the entered methods to normalize the number
                        of reads per bin. By default, no normalization is
                        performed. RPKM = Reads Per Kilobase per Million
                        mapped reads; CPM = Counts Per Million mapped reads,
                        same as CPM in RNA-seq; BPM = Bins Per Million mapped
                        reads, same as TPM in RNA-seq; RPGC = reads per
                        genomic content (1x normalization); Mapped reads are
                        considered after blacklist filtering (if applied).
                        RPKM (per bin) = number of reads per bin / (number of
                        mapped reads (in millions) * bin length (kb)). CPM
                        (per bin) = number of reads per bin / number of mapped
                        reads (in millions). BPM (per bin) = number of reads
                        per bin / sum of all reads per bin (in millions). RPGC
                        (per bin) = number of reads per bin / scaling factor
                        for 1x average coverage. None = the default and
                        equivalent to not setting this option at all. This
                        scaling factor, in turn, is determined from the
                        sequencing depth: (total number of mapped reads *
                        fragment length) / effective genome size. The scaling
                        factor used is the inverse of the sequencing depth
                        computed for the sample to match the 1x coverage. This
                        option requires --effectiveGenomeSize. Each read is
                        considered independently, if you want to only count
                        one mate from a pair in paired-end data, then use the
                        --samFlagInclude/--samFlagExclude options. (Default:
                        None)
  --exactScaling        Instead of computing scaling factors based on a
                        sampling of the reads, process all of the reads to
                        determine the exact number that will be used in the
                        output. This requires significantly more time to
                        compute, but will produce more accurate scaling
                        factors in cases where alignments that are being
                        filtered are rare and lumped together. In other words,
                        this is only needed when region-based sampling is
                        expected to produce incorrect results. (default:
                        False)
  --ignoreForNormalization IGNOREFORNORMALIZATION [IGNOREFORNORMALIZATION ...], -ignore IGNOREFORNORMALIZATION [IGNOREFORNORMALIZATION ...]
                        A list of space-delimited chromosome names containing
                        those chromosomes that should be excluded for
                        computing the normalization. This is useful when
                        considering samples with unequal coverage across
                        chromosomes, like male samples. An usage examples is
                        --ignoreForNormalization chrX chrM. (default: None)
  --skipNonCoveredRegions, --skipNAs
                        This parameter determines if non-covered regions
                        (regions without overlapping reads) in a BAM file
                        should be skipped. The default is to treat those
                        regions as having a value of zero. The decision to
                        skip non-covered regions depends on the interpretation
                        of the data. Non-covered regions may represent, for
                        example, repetitive regions that should be skipped.
                        (default: False)
  --smoothLength INT bp
                        The smooth length defines a window, larger than the
                        binSize, to average the number of reads. For example,
                        if the --binSize is set to 20 and the --smoothLength
                        is set to 60, then, for each bin, the average of the
                        bin and its left and right neighbors is considered.
                        Any value smaller than --binSize will be ignored and
                        no smoothing will be applied. (default: None)

Read processing options:
  --extendReads [INT bp], -e [INT bp]
                        This parameter allows the extension of reads to
                        fragment size. If set, each read is extended, without
                        exception. *NOTE*: This feature is generally NOT
                        recommended for spliced-read data, such as RNA-seq, as
                        it would extend reads over skipped regions. *Single-
                        end*: Requires a user specified value for the final
                        fragment length. Reads that already exceed this
                        fragment length will not be extended. *Paired-end*:
                        Reads with mates are always extended to match the
                        fragment size defined by the two read mates. Unmated
                        reads, mate reads that map too far apart (>4x fragment
                        length) or even map to different chromosomes are
                        treated like single-end reads. The input of a fragment
                        length value is optional. If no value is specified, it
                        is estimated from the data (mean of the fragment size
                        of all mate reads). (default: False)
  --ignoreDuplicates    If set, reads that have the same orientation and start
                        position will be considered only once. If reads are
                        paired, the mate's position also has to coincide to
                        ignore a read. (default: False)
  --minMappingQuality INT
                        If set, only reads that have a mapping quality score
                        of at least this are considered. (default: None)
  --centerReads         By adding this option, reads are centered with respect
                        to the fragment length. For paired-end data, the read
                        is centered at the fragment length defined by the two
                        ends of the fragment. For single-end data, the given
                        fragment length is used. This option is useful to get
                        a sharper signal around enriched regions. (default:
                        False)
  --samFlagInclude INT  Include reads based on the SAM flag. For example, to
                        get only reads that are the first mate, use a flag of
                        64. This is useful to count properly paired reads only
                        once, as otherwise the second mate will be also
                        considered for the coverage. (Default: None)
  --samFlagExclude INT  Exclude reads based on the SAM flag. For example, to
                        get only reads that map to the forward strand, use
                        --samFlagExclude 16, where 16 is the SAM flag for
                        reads that map to the reverse strand. (Default: None)
  --minFragmentLength INT
                        The minimum fragment length needed for read/pair
                        inclusion. This option is primarily useful in ATACseq
                        experiments, for filtering mono- or di-nucleosome
                        fragments. (Default: 0)
  --maxFragmentLength INT
                        The maximum fragment length needed for read/pair
                        inclusion. (Default: 0)

bamCoverage --bam ../results/HIF1a_Rep1.sorted.rmdup.bam -o ../results/HIF1a_Rep1.bigwig --normalizeUsing BPM --extendReads

normalization: BPM
bamFilesList: ['../results/HIF1a_Rep1.sorted.rmdup.bam']
binLength: 50
numberOfSamples: None
blackListFileName: None
skipZeroOverZero: False
bed_and_bin: False
genomeChunkSize: None
defaultFragmentLength: 223
numberOfProcessors: 1
verbose: False
region: None
bedFile: None
minMappingQuality: None
ignoreDuplicates: False
chrsToSkip: []
stepSize: 50
center_read: False
samFlag_include: None
samFlag_exclude: None
minFragmentLength: 0
maxFragmentLength: 0
zerosToNans: False
smoothLength: None
save_data: False
out_file_for_raw_data: None
maxPairedFragmentLength: 892

Let’s see the parameters of the call:

–bam: path of the input .bam file.
-o: path of the output bigwig file.
–normalizeUsing: Use one of the entered methods to normalize the number of reads per bin. By default, no normalization is performed. BPM = Bins Per Million mapped reads, same as TPM in RNA-seq.
–extendReads: In the paired-end mode ( as our case), it will extend the reads to the mean of the fragment size of all mate reads.

Now, we should have our .bigwig file:

ls ../results/HIF1a_Rep1.bigwig

../results/HIF1a_Rep1.bigwig

UCSC doesn’t allow to ‘upload’ directly your .bigwig file. Instead, you have to store them in a web-accessible http, https, or ftp location. One of the options to store .bigwig files is Cyverse.

Because data upload to servers is time-consuming, for this step, we are going to pretend we’ve already uploaded to data to Cyverse.

Now, open your UCSC genome browser and click in the Custom Tracks button. Then, you will add the following custom track:

track type=bigWig name="HIF1a_rep1" description="HIF1a bigwig" bigDataUrl=https://data.cyverse.org/dav-anon/iplant/home/arielmadr23/27_HIF_Smythies_2019/myHub/hg38/RCC4_Normoxia_HIF1a_PM14_Rep1.bw

You can read more about how to upload data to the UCSC genome browser in here:

02 - Deeptools

Deeptools also offers different tools for visualizing ChIP-seq data, including the generation of Heatmaps and different summary plots. For now, let’s try the plotHeatmap function.

mkdir -p ../results/deeptools

The first step is to compute the signal using computeMatrix. computeMatrix has two modes: * scale regions: In the scale-regions mode, all regions in the BED file are stretched or shrunken to the length (in bases) indicated by the user. * reference-point: Reference-point refers to a position within a BED region (e.g., the starting point). In this mode, only those genomicpositions before (upstream) and/or after (downstream) of the reference point will be plotted. You can read more about them here.

computeMatrix reference-point -S ../results/HIF1a_Rep1.bigwig \
              --referencePoint center \
              -R ../results/macs2/HIF1a_rep1_summits.bed \
              --beforeRegionStartLength 3000 \
              --afterRegionStartLength 3000 \
              --skipZeros \
              -o ../results/deeptools/matrix.mat.gz

-S: bigWig file(s) containing the scores to be plotted. Multiple files should be separated by spaced.
–referencePointThe: reference point for the plotting could be either the region start (TSS), the region end (TES) or the center of the region.
-R: File name or names, in BED or GTF format, containing the regions to plot. If multiple bed files are given, each one is considered a group that can be plotted separately.
–beforeRegionStartLength: Distance upstream of the reference-point selected. (Default: 500)
–afterRegionStartLength: Distance downstream of the reference-point selected. (Default: 1500)
skipZeros: Whether regions with only scores of zero should be included or not.
-o: File name to save.

plotHeatmap -m ../results/deeptools/matrix.mat.gz \
            --heatmapHeight 5  \
            --refPointLabel 'Peak summit' \
            --regionsLabel 'HIF1a peaks' \
            --plotTitle 'Peak summits (HIF1a)' \
            -out ../results/deeptools/HIF1a_heatmap.png

-m: Matrix file from the computeMatrix tool.
-out: File name to save the image to.
–heatmapHeight: Plot height in cm.
–refPointLabel: Label shown in the plot for the reference-point.
–regionsLabel: Labels for the regions plotted in the heatmap. If more than one region is being plotted, a list of labels separated by spaces is required.
–plotTitle: Title of the plot, to be printed on top of the generated image

03 - Exercises

1. Include the HIF2a data in the Heatmap from Deeptools. This heatmap should show the coverage for both HIF1a and HIF2a peaks.

sessionInfo()

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] digest_0.6.33   R6_2.5.1        fastmap_1.1.1   xfun_0.40      
 [5] cachem_1.0.8    knitr_1.43      htmltools_0.5.6 rmarkdown_2.24 
 [9] cli_3.6.1       sass_0.4.7      jquerylib_0.1.4 compiler_4.3.1 
[13] tools_4.3.1     mime_0.12       evaluate_0.21   bslib_0.5.1    
[17] yaml_2.3.7      rlang_1.1.1     jsonlite_1.8.7

LS0tCnRpdGxlOiAiTWlDTTogMDQgLSB2aXN1YWxpemF0aW9uIgphdXRob3I6ICJBcmllbCBNYWRyaWdhbCBBZ3VpcnJlIgpkYXRlOiAiMjAyMy0xMC0yNSIKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6CiAgICBkZl9wcmludDogcGFnZWQKICAgIGNvZGVfZm9sZGluZzogc2hvdwogICAgdG9jOiB5ZXMKICAgIHRvY19mbG9hdDogCiAgICAgIGNvbGxhcHNlZDogZmFsc2UKICAgICAgc21vb3RoX3Njcm9sbDogZmFsc2UKLS0tCgpgYGB7cixlY2hvPUZBTFNFfQpodG1sdG9vbHM6OmltZyhzcmMgPSBrbml0cjo6aW1hZ2VfdXJpKCJpbWFnZXMvbWljbV9jb2xvcl9sb2dvLnBuZyIpLCAKICAgICAgICAgICAgICAgYWx0ID0gJ2xvZ28nLCAKICAgICAgICAgICAgICAgc3R5bGUgPSAncG9zaXRpb246YWJzb2x1dGU7IHRvcDozMHB4OyByaWdodDowOyBwYWRkaW5nOjEwcHg7IG1heC13aWR0aDo1MCU7JykKYGBgCgpUaGlzIHdvcmtzaG9wIHdpbGwgZ3VpZGUgeW91IHRocm91Z2ggdGhlIGJhc2ljcyBvZiBDaElQLXNlcSBhbmFseXNpcyB3aXRoIGhhbmRzLW9uIGV4ZXJjaXNlcy4gWW91IHdpbGwgbGVhcm4gaG93IHRvIHByb2Nlc3MgQ2hJUC1zZXEgZGF0YTogcGVyZm9ybSByZWFkIGFsaWdubWVudCwgcGVhayBjYWxsaW5nLCBxdWFsaXR5IGNvbnRyb2wsIHZpc3VhbGl6YXRpb24gdGhyb3VnaCB0aGUgZ2Vub21lIGJyb3dzZXIsIG1vdGlmIGZpbmRpbmcgYW5kIGdlbmUgc2V0IGVucmljaG1lbnQgYW5hbHlzaXMuCgoqKk5vdGUqKjogQWxsIHRoZSBjb2RlIGluIHRoaXMgdHV0b3JpYWwgd2lsbCBiZSBwZXJmb3JtZWQgaW4gdGhlIHRlcm1pbmFsLiAKCiMgMDEgLSBVQ1NDIGdlbm9tZSBicm93c2VyCgojIyBVcGxvYWQgb2YgcGVhayBmaWxlcwoKT25lIG9mIHRoZSBtb3N0IGNvbW1vbiB1c2VmdWwgd2F5cyB0byB2aXN1YWxpemUgb3VyIGRhdGEgaXMgdGhyb3VnaCB0aGUgW1VDU0MgZ2Vub21lIGJyb3dzZXJdKGh0dHBzOi8vZ2Vub21lLnVjc2MuZWR1L2luZGV4Lmh0bWwpLiBXZSBjYW4gdmlzdWFsaXplIHRoZSBwZWFrcyB0aGF0IHdlJ3ZlIGdlbmVyYXRlZCB3aXRoIG1hY3MgYW5kIHNlZSBkaXJlY3RseSB0aGUgcmVnaW9ucyBvZiB0aGUgZ2Vub21lIHdoZXJlIG91ciBwcm90ZWluIG9mIGludGVyZXN0IGlzIGJpbmRpbmcuIAoKTGV0J3MgbW92ZSB0byB0aGUgc2NyaXB0cyBkaXJlY3RvcnkgKG9yIHN0YXkgaGVyZSBpZiBhbHJlYWR5IHRoZXJlKToKYGBge2Jhc2gsIGV2YWw9Rn0KY2QgY2hpcF9zZXEvc2NyaXB0cwpgYGAKClRvIGRvIHRoaXMsIHdlIG5lZWQgdG8gdXBsb2FkIHRoZSAuYmVkIGZpbGVzIHRoYXQgd2UgZ290IGZyb20gbWFjczIKYGBge2Jhc2h9CmhlYWQgLi4vcmVzdWx0cy9tYWNzMi9ISUYxYV9yZXAxX3BlYWtzLmJlZApgYGAKSW4gdGhlIGdlbm9tZSBicm93c2VyLCB2aXN1YWxpemUgdGhlIGNvb3JkaW5hdGVzIG9mIGEgZ2l2ZW4gcGVhayBpbiB5b3VyIC5iZWQgZmlsZS4gCgpSZW1lbWJlciB0aGF0IGluIHRoZSBVQ1NDIGdlbm9tZSBicm93c2VyIHlvdSBoYXZlIHRvIHNwZWNpZnkgdGhlIHBvc2l0aW9ucyBpbiB0aGlzIGZvcm1hdDogY2hyOnN0YXJ0LWVuZC4gCgojIyBHZW5lcmF0aW9uIG9mIGJpZ3dpZyAKCk9uZSBvZiB0aGUgbW9zdCBjb21tb24vdXNlZnVsIGZvcm1hdHMgdG8gdmlzdWFsaXplIHRoZSBkYXRhIGlzIHRoZSBiaWd3aWcgZm9ybWF0LiBUbyBnZW5lcmF0ZSB0aGlzLCB3ZSBhcmUgZ29pbmcgdG8gdXNlIFtkZWVwdG9vbHNdKGh0dHBzOi8vZGVlcHRvb2xzLnJlYWR0aGVkb2NzLmlvL2VuL2RldmVsb3AvKS5kZWVwVG9vbHMgaXMgYSBzdWl0ZSBvZiBweXRob24gdG9vbHMgcGFydGljdWxhcmx5IGRldmVsb3BlZCBmb3IgdGhlIGVmZmljaWVudCBhbmFseXNpcyBvZiBoaWdoLXRocm91Z2hwdXQgc2VxdWVuY2luZyBkYXRhLCBzdWNoIGFzIENoSVAtc2VxLCBSTkEtc2VxIG9yIE1OYXNlLXNlcS4gSXQgb2ZmZXJzIGEgbG90IG9mIHRvb2xzIGZvciBwcm9jZXNzaW5nIEJBTSBhbmQgYmlnd2lnIGZpbGVzLgoKCldlIGFyZSBnb2luZyB0byBjb252ZXJ0IG91ciAuQkFNIGZpbGUgKCBmcm9tIG91ciBhbGlnbm1lbnQgc3RlcCkgaW50byBhIC5iaWd3aWcgZmlsZQoKYGBge2Jhc2h9CmxzIC4uL3Jlc3VsdHMvSElGMWFfUmVwMS5zb3J0ZWQucm1kdXAuYmFtCmBgYApUbyBjb252ZXJ0LCB3ZSBhcmUgZ29pbmcgdG8gdXNlIHRoZSBiYW1Db3ZlcmFnZSBjb21tYW5kCmBgYHtiYXNofQpiYW1Db3ZlcmFnZSAtLWhlbHAKYGBgCmBgYHtiYXNofQpiYW1Db3ZlcmFnZSAtLWJhbSAuLi9yZXN1bHRzL0hJRjFhX1JlcDEuc29ydGVkLnJtZHVwLmJhbSAtbyAuLi9yZXN1bHRzL0hJRjFhX1JlcDEuYmlnd2lnIC0tbm9ybWFsaXplVXNpbmcgQlBNIC0tZXh0ZW5kUmVhZHMKYGBgCkxldCdzIHNlZSB0aGUgcGFyYW1ldGVycyBvZiB0aGUgY2FsbDoKCiogLS1iYW06IHBhdGggb2YgdGhlIGlucHV0IC5iYW0gZmlsZS4KKiAtbzogcGF0aCBvZiB0aGUgb3V0cHV0IGJpZ3dpZyBmaWxlLiAKKiAtLW5vcm1hbGl6ZVVzaW5nOiBVc2Ugb25lIG9mIHRoZSBlbnRlcmVkIG1ldGhvZHMgdG8gbm9ybWFsaXplIHRoZSBudW1iZXIgb2YgcmVhZHMgcGVyIGJpbi4gQnkgZGVmYXVsdCwgbm8gbm9ybWFsaXphdGlvbiBpcyBwZXJmb3JtZWQuIEJQTSA9IEJpbnMgUGVyIE1pbGxpb24gbWFwcGVkIHJlYWRzLCBzYW1lIGFzIFRQTSBpbiBSTkEtc2VxLiAKKiAtLWV4dGVuZFJlYWRzOiBJbiB0aGUgcGFpcmVkLWVuZCBtb2RlICggYXMgb3VyIGNhc2UpLCBpdCB3aWxsIGV4dGVuZCB0aGUgcmVhZHMgdG8gdGhlIG1lYW4gb2YgdGhlIGZyYWdtZW50IHNpemUgb2YgYWxsIG1hdGUgcmVhZHMuIAoKTm93LCB3ZSBzaG91bGQgaGF2ZSBvdXIgLmJpZ3dpZyBmaWxlOgpgYGB7YmFzaH0KbHMgLi4vcmVzdWx0cy9ISUYxYV9SZXAxLmJpZ3dpZwpgYGAKVUNTQyBkb2Vzbid0IGFsbG93IHRvICd1cGxvYWQnIGRpcmVjdGx5IHlvdXIgLmJpZ3dpZyBmaWxlLiBJbnN0ZWFkLCB5b3UgaGF2ZSB0byBzdG9yZSB0aGVtIGluIGEgd2ViLWFjY2Vzc2libGUgaHR0cCwgaHR0cHMsIG9yIGZ0cCBsb2NhdGlvbi4gT25lIG9mIHRoZSBvcHRpb25zIHRvIHN0b3JlIC5iaWd3aWcgZmlsZXMgaXMgW0N5dmVyc2VdKGh0dHBzOi8vY3l2ZXJzZS5vcmcvZGlzY292ZXJ5LWVudmlyb25tZW50KS4gCgpCZWNhdXNlIGRhdGEgdXBsb2FkIHRvIHNlcnZlcnMgaXMgdGltZS1jb25zdW1pbmcsIGZvciB0aGlzIHN0ZXAsIHdlIGFyZSBnb2luZyB0byBwcmV0ZW5kIHdlJ3ZlIGFscmVhZHkgdXBsb2FkZWQgdG8gZGF0YSB0byBDeXZlcnNlLgoKTm93LCBvcGVuIHlvdXIgVUNTQyBnZW5vbWUgYnJvd3NlciBhbmQgY2xpY2sgaW4gdGhlIEN1c3RvbSBUcmFja3MgYnV0dG9uLiBUaGVuLCB5b3Ugd2lsbCBhZGQgdGhlIGZvbGxvd2luZyBjdXN0b20gdHJhY2s6CmBgYHt9CnRyYWNrIHR5cGU9YmlnV2lnIG5hbWU9IkhJRjFhX3JlcDEiIGRlc2NyaXB0aW9uPSJISUYxYSBiaWd3aWciIGJpZ0RhdGFVcmw9aHR0cHM6Ly9kYXRhLmN5dmVyc2Uub3JnL2Rhdi1hbm9uL2lwbGFudC9ob21lL2FyaWVsbWFkcjIzLzI3X0hJRl9TbXl0aGllc18yMDE5L215SHViL2hnMzgvUkNDNF9Ob3Jtb3hpYV9ISUYxYV9QTTE0X1JlcDEuYncKYGBgCllvdSBjYW4gcmVhZCBtb3JlIGFib3V0IGhvdyB0byB1cGxvYWQgZGF0YSB0byB0aGUgVUNTQyBnZW5vbWUgYnJvd3NlciBpbiBoZXJlOgoKKiBbQ3JlYXRlIGN1c3RvbSB0cmFja3NdKGh0dHBzOi8vZ2Vub21lLnVjc2MuZWR1L2dvbGRlblBhdGgvaGVscC9jdXN0b21UcmFjay5odG1sKQoqIFtDcmVhdGUgVHJhY2sgSHVic10oaHR0cHM6Ly9nZW5vbWUudWNzYy5lZHUvZ29sZGVucGF0aC9oZWxwL2hnVHJhY2tIdWJIZWxwLmh0bWwpIAoKIyAwMiAtIERlZXB0b29scwoKRGVlcHRvb2xzIGFsc28gb2ZmZXJzIGRpZmZlcmVudCB0b29scyBmb3IgdmlzdWFsaXppbmcgQ2hJUC1zZXEgZGF0YSwgaW5jbHVkaW5nIHRoZSBnZW5lcmF0aW9uIG9mIEhlYXRtYXBzIGFuZCBkaWZmZXJlbnQgc3VtbWFyeSBwbG90cy4gRm9yIG5vdywgbGV0J3MgdHJ5IHRoZSBbcGxvdEhlYXRtYXAgZnVuY3Rpb25dKGh0dHBzOi8vZGVlcHRvb2xzLnJlYWR0aGVkb2NzLmlvL2VuL2RldmVsb3AvY29udGVudC90b29scy9wbG90SGVhdG1hcC5odG1sKS4gCgpgYGB7YmFzaH0KbWtkaXIgLXAgLi4vcmVzdWx0cy9kZWVwdG9vbHMKYGBgClRoZSBmaXJzdCBzdGVwIGlzIHRvIGNvbXB1dGUgdGhlIHNpZ25hbCB1c2luZyBjb21wdXRlTWF0cml4LiBjb21wdXRlTWF0cml4IGhhcyB0d28gbW9kZXM6IAoqIHNjYWxlIHJlZ2lvbnM6IEluIHRoZSBzY2FsZS1yZWdpb25zIG1vZGUsIGFsbCByZWdpb25zIGluIHRoZSBCRUQgZmlsZSBhcmUgc3RyZXRjaGVkIG9yIHNocnVua2VuIHRvIHRoZSBsZW5ndGggKGluIGJhc2VzKSBpbmRpY2F0ZWQgYnkgdGhlIHVzZXIuIAoqIHJlZmVyZW5jZS1wb2ludDogUmVmZXJlbmNlLXBvaW50IHJlZmVycyB0byBhIHBvc2l0aW9uIHdpdGhpbiBhIEJFRCByZWdpb24gKGUuZy4sIHRoZSBzdGFydGluZyBwb2ludCkuIEluIHRoaXMgbW9kZSwgb25seSB0aG9zZSBnZW5vbWljcG9zaXRpb25zIGJlZm9yZSAodXBzdHJlYW0pIGFuZC9vciBhZnRlciAoZG93bnN0cmVhbSkgb2YgdGhlIHJlZmVyZW5jZSBwb2ludCB3aWxsIGJlIHBsb3R0ZWQuIFlvdSBjYW4gcmVhZCBtb3JlIGFib3V0IHRoZW0gW2hlcmVdKGh0dHBzOi8vZGVlcHRvb2xzLnJlYWR0aGVkb2NzLmlvL2VuL2RldmVsb3AvY29udGVudC90b29scy9jb21wdXRlTWF0cml4Lmh0bWwpLiAKCmBgYHtiYXNofQpjb21wdXRlTWF0cml4IHJlZmVyZW5jZS1wb2ludCAtUyAuLi9yZXN1bHRzL0hJRjFhX1JlcDEuYmlnd2lnIFwKCQkJICAtLXJlZmVyZW5jZVBvaW50IGNlbnRlciBcCgkJCSAgLVIgLi4vcmVzdWx0cy9tYWNzMi9ISUYxYV9yZXAxX3N1bW1pdHMuYmVkIFwKCQkJICAtLWJlZm9yZVJlZ2lvblN0YXJ0TGVuZ3RoIDMwMDAgXAoJCQkgIC0tYWZ0ZXJSZWdpb25TdGFydExlbmd0aCAzMDAwIFwKCQkJICAtLXNraXBaZXJvcyBcCgkJCSAgLW8gLi4vcmVzdWx0cy9kZWVwdG9vbHMvbWF0cml4Lm1hdC5negpgYGAKKiAtUzogYmlnV2lnIGZpbGUocykgY29udGFpbmluZyB0aGUgc2NvcmVzIHRvIGJlIHBsb3R0ZWQuIE11bHRpcGxlIGZpbGVzIHNob3VsZCBiZSBzZXBhcmF0ZWQgYnkgc3BhY2VkLgoqIC0tcmVmZXJlbmNlUG9pbnRUaGU6IHJlZmVyZW5jZSBwb2ludCBmb3IgdGhlIHBsb3R0aW5nIGNvdWxkIGJlIGVpdGhlciB0aGUgcmVnaW9uIHN0YXJ0IChUU1MpLCB0aGUgcmVnaW9uIGVuZCAoVEVTKSBvciB0aGUgY2VudGVyIG9mIHRoZSByZWdpb24uCiogLVI6IEZpbGUgbmFtZSBvciBuYW1lcywgaW4gQkVEIG9yIEdURiBmb3JtYXQsIGNvbnRhaW5pbmcgdGhlIHJlZ2lvbnMgdG8gcGxvdC4gSWYgbXVsdGlwbGUgYmVkIGZpbGVzIGFyZSBnaXZlbiwgZWFjaCBvbmUgaXMgY29uc2lkZXJlZCBhIGdyb3VwIHRoYXQgY2FuIGJlIHBsb3R0ZWQgc2VwYXJhdGVseS4KKiAtLWJlZm9yZVJlZ2lvblN0YXJ0TGVuZ3RoOiBEaXN0YW5jZSB1cHN0cmVhbSBvZiB0aGUgcmVmZXJlbmNlLXBvaW50IHNlbGVjdGVkLiAoRGVmYXVsdDogNTAwKQoqIC0tYWZ0ZXJSZWdpb25TdGFydExlbmd0aDogRGlzdGFuY2UgZG93bnN0cmVhbSBvZiB0aGUgcmVmZXJlbmNlLXBvaW50IHNlbGVjdGVkLiAoRGVmYXVsdDogMTUwMCkKKiBza2lwWmVyb3M6IFdoZXRoZXIgcmVnaW9ucyB3aXRoIG9ubHkgc2NvcmVzIG9mIHplcm8gc2hvdWxkIGJlIGluY2x1ZGVkIG9yIG5vdC4KKiAtbzogRmlsZSBuYW1lIHRvIHNhdmUuIAoKYGBge2Jhc2h9CnBsb3RIZWF0bWFwIC1tIC4uL3Jlc3VsdHMvZGVlcHRvb2xzL21hdHJpeC5tYXQuZ3ogXAoJCQktLWhlYXRtYXBIZWlnaHQgNSAgXAoJCQktLXJlZlBvaW50TGFiZWwgJ1BlYWsgc3VtbWl0JyBcCgkJCS0tcmVnaW9uc0xhYmVsICdISUYxYSBwZWFrcycgXAoJCQktLXBsb3RUaXRsZSAnUGVhayBzdW1taXRzIChISUYxYSknIFwKCQkJLW91dCAuLi9yZXN1bHRzL2RlZXB0b29scy9ISUYxYV9oZWF0bWFwLnBuZwpgYGAKCiFbXSguLi9yZXN1bHRzL2RlZXB0b29scy9ISUYxYV9oZWF0bWFwLnBuZykKCiogLW06IE1hdHJpeCBmaWxlIGZyb20gdGhlIGNvbXB1dGVNYXRyaXggdG9vbC4gCiogLW91dDogRmlsZSBuYW1lIHRvIHNhdmUgdGhlIGltYWdlIHRvLiAKKiAtLWhlYXRtYXBIZWlnaHQ6IFBsb3QgaGVpZ2h0IGluIGNtLgoqIC0tcmVmUG9pbnRMYWJlbDogTGFiZWwgc2hvd24gaW4gdGhlIHBsb3QgZm9yIHRoZSByZWZlcmVuY2UtcG9pbnQuIAoqIC0tcmVnaW9uc0xhYmVsOiBMYWJlbHMgZm9yIHRoZSByZWdpb25zIHBsb3R0ZWQgaW4gdGhlIGhlYXRtYXAuIElmIG1vcmUgdGhhbiBvbmUgcmVnaW9uIGlzIGJlaW5nIHBsb3R0ZWQsIGEgbGlzdCBvZiBsYWJlbHMgc2VwYXJhdGVkIGJ5IHNwYWNlcyBpcyByZXF1aXJlZC4KKiAtLXBsb3RUaXRsZTogVGl0bGUgb2YgdGhlIHBsb3QsIHRvIGJlIHByaW50ZWQgb24gdG9wIG9mIHRoZSBnZW5lcmF0ZWQgaW1hZ2UKCiMgMDMgLSBFeGVyY2lzZXMKCiogMS4gSW5jbHVkZSB0aGUgSElGMmEgZGF0YSBpbiB0aGUgSGVhdG1hcCBmcm9tIERlZXB0b29scy4gVGhpcyBoZWF0bWFwIHNob3VsZCBzaG93IHRoZSAgY292ZXJhZ2UgZm9yIGJvdGggSElGMWEgYW5kIEhJRjJhIHBlYWtzLiAKCgpgYGB7cn0Kc2Vzc2lvbkluZm8oKQpgYGAK

MiCM: 04 - visualization

Ariel Madrigal Aguirre

2023-10-25

01 - UCSC genome browser

Upload of peak files

Generation of bigwig

02 - Deeptools

03 - Exercises