This workshop will guide you through the basics of ChIP-seq analysis with hands-on exercises. You will learn how to process ChIP-seq data: perform read alignment, peak calling, quality control, visualization through the genome browser, motif finding and gene set enrichment analysis.
In this tutorial, we are going to focus on peak calling.
Note: All the code in this tutorial will be performed in the terminal.
Let’s move to the scripts directory (or stay here if already there):
cd chip_seq/scripts
The main command that we are going to use with macs2 is the callpeak function. The manual can be found here.
macs2 callpeak --help
usage: macs2 callpeak [-h] -t TFILE [TFILE ...] [-c [CFILE [CFILE ...]]]
[-f {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE,BEDPE}]
[-g GSIZE] [-s TSIZE] [--keep-dup KEEPDUPLICATES]
[--outdir OUTDIR] [-n NAME] [-B] [--verbose VERBOSE]
[--trackline] [--SPMR] [--nomodel] [--shift SHIFT]
[--extsize EXTSIZE] [--bw BW] [--d-min D_MIN]
[-m MFOLD MFOLD] [--fix-bimodal] [-q QVALUE | -p PVALUE]
[--scale-to {large,small}] [--down-sample] [--seed SEED]
[--tempdir TEMPDIR] [--nolambda] [--slocal SMALLLOCAL]
[--llocal LARGELOCAL] [--max-gap MAXGAP]
[--min-length MINLEN] [--broad]
[--broad-cutoff BROADCUTOFF] [--cutoff-analysis]
[--call-summits] [--fe-cutoff FECUTOFF]
[--buffer-size BUFFER_SIZE] [--to-large] [--ratio RATIO]
optional arguments:
-h, --help show this help message and exit
Input files arguments:
-t TFILE [TFILE ...], --treatment TFILE [TFILE ...]
ChIP-seq treatment file. If multiple files are given
as '-t A B C', then they will all be read and pooled
together. REQUIRED.
-c [CFILE [CFILE ...]], --control [CFILE [CFILE ...]]
Control file. If multiple files are given as '-c A B
C', they will be pooled to estimate ChIP-seq
background noise.
-f {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE,BEDPE}, --format {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE,BEDPE}
Format of tag file, "AUTO", "BED" or "ELAND" or
"ELANDMULTI" or "ELANDEXPORT" or "SAM" or "BAM" or
"BOWTIE" or "BAMPE" or "BEDPE". The default AUTO
option will let MACS decide which format (except for
BAMPE and BEDPE which should be implicitly set) the
file is. Please check the definition in README. Please
note that if the format is set as BAMPE or BEDPE,
MACS2 will call its special Paired-end mode to call
peaks by piling up the actual ChIPed fragments defined
by both aligned ends, instead of predicting the
fragment size first and extending reads. Also please
note that the BEDPE only contains three columns, and
is NOT the same BEDPE format used by BEDTOOLS.
DEFAULT: "AUTO"
-g GSIZE, --gsize GSIZE
Effective genome size. It can be 1.0e+9 or 1000000000,
or shortcuts:'hs' for human (2.7e9), 'mm' for mouse
(1.87e9), 'ce' for C. elegans (9e7) and 'dm' for
fruitfly (1.2e8), Default:hs
-s TSIZE, --tsize TSIZE
Tag size/read length. This will override the auto
detected tag size. DEFAULT: Not set
--keep-dup KEEPDUPLICATES
It controls the behavior towards duplicate tags at the
exact same location -- the same coordination and the
same strand. The 'auto' option makes MACS calculate
the maximum tags at the exact same location based on
binomal distribution using 1e-5 as pvalue cutoff; and
the 'all' option keeps every tags. If an integer is
given, at most this number of tags will be kept at the
same location. Note, if you've used samtools or picard
to flag reads as 'PCR/Optical duplicate' in bit 1024,
MACS2 will still read them although the reads may be
decided by MACS2 as duplicate later. If you plan to
rely on samtools/picard/any other tool to filter
duplicates, please remove those duplicate reads and
save a new alignment file then ask MACS2 to keep all
by '--keep-dup all'. The default is to keep one tag at
the same location. Default: 1
Output arguments:
--outdir OUTDIR If specified all output files will be written to that
directory. Default: the current working directory
-n NAME, --name NAME Experiment name, which will be used to generate output
file names. DEFAULT: "NA"
-B, --bdg Whether or not to save extended fragment pileup, and
local lambda tracks (two files) at every bp into a
bedGraph file. DEFAULT: False
--verbose VERBOSE Set verbose level of runtime message. 0: only show
critical message, 1: show additional warning message,
2: show process information, 3: show debug messages.
DEFAULT:2
--trackline Tells MACS to include trackline with bedGraph files.
To include this trackline while displaying bedGraph at
UCSC genome browser, can show name and description of
the file as well. However my suggestion is to convert
bedGraph to bigWig, then show the smaller and faster
binary bigWig file at UCSC genome browser, as well as
downstream analysis. Require -B to be set. Default:
Not include trackline.
--SPMR If True, MACS will SAVE signal per million reads for
fragment pileup profiles. It won't interfere with
computing pvalue/qvalue during peak calling, since
internally MACS2 keeps using the raw pileup and
scaling factors between larger and smaller dataset to
calculate statistics measurements. If you plan to use
the signal output in bedGraph to call peaks using
bdgcmp and bdgpeakcall, you shouldn't use this option
because you will end up with different results.
However, this option is recommended for displaying
normalized pileup tracks across many datasets. Require
-B to be set. Default: False
Shifting model arguments:
--nomodel Whether or not to build the shifting model. If True,
MACS will not build model. by default it means
shifting size = 100, try to set extsize to change it.
It's highly recommended that while you have many
datasets to process and you plan to compare different
conditions, aka differential calling, use both
'nomodel' and 'extsize' to make signal files from
different datasets comparable. DEFAULT: False
--shift SHIFT (NOT the legacy --shiftsize option!) The arbitrary
shift in bp. Use discretion while setting it other
than default value. When NOMODEL is set, MACS will use
this value to move cutting ends (5') towards 5'->3'
direction then apply EXTSIZE to extend them to
fragments. When this value is negative, ends will be
moved toward 3'->5' direction. Recommended to keep it
as default 0 for ChIP-Seq datasets, or -1 * half of
EXTSIZE together with EXTSIZE option for detecting
enriched cutting loci such as certain DNAseI-Seq
datasets. Note, you can't set values other than 0 if
format is BAMPE or BEDPE for paired-end data. DEFAULT:
0.
--extsize EXTSIZE The arbitrary extension size in bp. When nomodel is
true, MACS will use this value as fragment size to
extend each read towards 3' end, then pile them up.
It's exactly twice the number of obsolete SHIFTSIZE.
In previous language, each read is moved 5'->3'
direction to middle of fragment by 1/2 d, then
extended to both direction with 1/2 d. This is
equivalent to say each read is extended towards 5'->3'
into a d size fragment. DEFAULT: 200. EXTSIZE and
SHIFT can be combined when necessary. Check SHIFT
option.
--bw BW Band width for picking regions to compute fragment
size. This value is only used while building the
shifting model. Tweaking this is not recommended.
DEFAULT: 300
--d-min D_MIN Minimum fragment size in basepair. Any predicted
fragment size less than this will be excluded.
DEFAULT: 20
-m MFOLD MFOLD, --mfold MFOLD MFOLD
Select the regions within MFOLD range of high-
confidence enrichment ratio against background to
build model. Fold-enrichment in regions must be lower
than upper limit, and higher than the lower limit. Use
as "-m 10 30". This setting is only used while
building the shifting model. Tweaking it is not
recommended. DEFAULT:5 50
--fix-bimodal Whether turn on the auto pair model process. If set,
when MACS failed to build paired model, it will use
the nomodel settings, the --exsize parameter to extend
each tags towards 3' direction. Not to use this
automate fixation is a default behavior now. DEFAULT:
False
Peak calling arguments:
-q QVALUE, --qvalue QVALUE
Minimum FDR (q-value) cutoff for peak detection.
DEFAULT: 0.05. -q, and -p are mutually exclusive.
-p PVALUE, --pvalue PVALUE
Pvalue cutoff for peak detection. DEFAULT: not set.
-q, and -p are mutually exclusive. If pvalue cutoff is
set, qvalue will not be calculated and reported as -1
in the final .xls file.
--scale-to {large,small}
When set to 'small', scale the larger sample up to the
smaller sample. When set to 'larger', scale the
smaller sample up to the bigger sample. By default,
scale to 'small'. This option replaces the obsolete '
--to-large' option. The default behavior is
recommended since it will lead to less significant
p/q-values in general but more specific results. Keep
in mind that scaling down will influence control/input
sample more. DEFAULT: 'small', the choice is either
'small' or 'large'.
--down-sample When set, random sampling method will scale down the
bigger sample. By default, MACS uses linear scaling.
Warning: This option will make your result unstable
and irreproducible since each time, random reads would
be selected. Consider to use 'randsample' script
instead. <not implmented>If used together with --SPMR,
1 million unique reads will be randomly picked.</not
implemented> Caution: due to the implementation, the
final number of selected reads may not be as you
expected! DEFAULT: False
--seed SEED Set the random seed while down sampling data. Must be
a non-negative integer in order to be effective.
DEFAULT: not set
--tempdir TEMPDIR Optional directory to store temp files. DEFAULT: /tmp
--nolambda If True, MACS will use fixed background lambda as
local lambda for every peak region. Normally, MACS
calculates a dynamic local lambda to reflect the local
bias due to the potential chromatin accessibility.
--slocal SMALLLOCAL The small nearby region in basepairs to calculate
dynamic lambda. This is used to capture the bias near
the peak summit region. Invalid if there is no control
data. If you set this to 0, MACS will skip slocal
lambda calculation. *Note* that MACS will always
perform a d-size local lambda calculation while the
control data is available. The final local bias would
be the maximum of the lambda value from d, slocal, and
llocal size windows. While control is not available, d
and slocal lambda won't be considered. DEFAULT: 1000
--llocal LARGELOCAL The large nearby region in basepairs to calculate
dynamic lambda. This is used to capture the surround
bias. If you set this to 0, MACS will skip llocal
lambda calculation. *Note* that MACS will always
perform a d-size local lambda calculation while the
control data is available. The final local bias would
be the maximum of the lambda value from d, slocal, and
llocal size windows. While control is not available, d
and slocal lambda won't be considered. DEFAULT: 10000.
--max-gap MAXGAP Maximum gap between significant sites to cluster them
together. The DEFAULT value is the detected read
length/tag size.
--min-length MINLEN Minimum length of a peak. The DEFAULT value is the
predicted fragment size d. Note, if you set a value
smaller than the fragment size, it may have NO effect
on the result. For BROAD peak calling, try to set a
large value such as 500bps. You can also use '--
cutoff-analysis' option with default setting, and
check the column 'avelpeak' under different cutoff
values to decide a reasonable minlen value.
--broad If set, MACS will try to call broad peaks using the
--broad-cutoff setting. Please tweak '--broad-cutoff'
setting to control the peak calling behavior. At the
meantime, either -q or -p cutoff will be used to
define regions with 'stronger enrichment' inside of
broad peaks. The maximum gap is expanded to 4 * MAXGAP
(--max-gap parameter). As a result, MACS will output a
'gappedPeak' and a 'broadPeak' file instead of
'narrowPeak' file. Note, a broad peak will be reported
even if there is no 'stronger enrichment' inside.
DEFAULT: False
--broad-cutoff BROADCUTOFF
Cutoff for broad region. This option is not available
unless --broad is set. If -p is set, this is a pvalue
cutoff, otherwise, it's a qvalue cutoff. Please note
that in broad peakcalling mode, MACS2 uses this
setting to control the overall peak calling behavior,
then uses -q or -p setting to define regions inside
broad region as 'stronger' enrichment. DEFAULT: 0.1
--cutoff-analysis While set, MACS2 will analyze number or total length
of peaks that can be called by different p-value
cutoff then output a summary table to help user decide
a better cutoff. The table will be saved in
NAME_cutoff_analysis.txt file. Note, minlen and maxgap
may affect the results. WARNING: May take ~30 folds
longer time to finish. The result can be useful for
users to decide a reasonable cutoff value. DEFAULT:
False
Post-processing options:
--call-summits If set, MACS will use a more sophisticated signal
processing approach to find subpeak summits in each
enriched peak region. DEFAULT: False
--fe-cutoff FECUTOFF When set, the value will be used to filter out peaks
with low fold-enrichment. Note, MACS2 use 1.0 as
pseudocount while calculating fold-enrichment.
DEFAULT: 1.0
Other options:
--buffer-size BUFFER_SIZE
Buffer size for incrementally increasing internal
array size to store reads alignment information. In
most cases, you don't have to change this parameter.
However, if there are large number of
chromosomes/contigs/scaffolds in your alignment, it's
recommended to specify a smaller buffer size in order
to decrease memory usage (but it will take longer time
to read alignment files). Minimum memory requested for
reading an alignment file is about # of CHROMOSOME *
BUFFER_SIZE * 8 Bytes. DEFAULT: 100000
Obsolete options:
--to-large Obsolete option. Please use '--scale-to large'
instead.
--ratio RATIO Obsolete option. Originally designed to normalize
treatment and control with customized ratio, now it
won't have any effect.
We will use the BAM file from our previous step, and an Input control BAM that has been already pre-processed.
ls ../data/Input_Rep1/
Input_Rep1.sorted.rmdup.bam
Input_Rep1.sorted.rmdup.bam.bai
If you didn’t compute the BAM for the alignment step, you can find it in here:
ls ../data/HIF1a_Rep1/HIF1a_Rep1.sorted.rmdup.bam
../data/HIF1a_Rep1/HIF1a_Rep1.sorted.rmdup.bam
Let’s create an output directory for storing MACS results
mkdir -p ../results/macs2
Now we are ready to call MACS
macs2 callpeak -t ../results/HIF1a_Rep1.sorted.rmdup.bam -c ../data/Input_Rep1/Input_Rep1.sorted.rmdup.bam -f BAMPE -g hs -n HIF1a_rep1 -q 0.05 --outdir ../results/macs2
INFO @ Sun, 22 Oct 2023 21:42:17:
# Command line: callpeak -t ../results/HIF1a_Rep1.sorted.rmdup.bam -c ../data/Input_Rep1/Input_Rep1.sorted.rmdup.bam -f BAMPE -g hs -n HIF1a_rep1 -q 0.05 --outdir ../results/macs2
# ARGUMENTS LIST:
# name = HIF1a_rep1
# format = BAMPE
# ChIP-seq file = ['../results/HIF1a_Rep1.sorted.rmdup.bam']
# control file = ['../data/Input_Rep1/Input_Rep1.sorted.rmdup.bam']
# effective genome size = 2.70e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# The maximum gap between significant sites is assigned as the read length/tag size.
# The minimum length of peaks is assigned as the predicted fragment length "d".
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# Paired-End mode is on
INFO @ Sun, 22 Oct 2023 21:42:17: #1 read fragment files...
INFO @ Sun, 22 Oct 2023 21:42:17: #1 read treatment fragments...
INFO @ Sun, 22 Oct 2023 21:42:17: 95298 fragments have been read.
INFO @ Sun, 22 Oct 2023 21:42:17: #1.2 read input fragments...
INFO @ Sun, 22 Oct 2023 21:42:18: 166484 fragments have been read.
INFO @ Sun, 22 Oct 2023 21:42:18: #1 mean fragment size is determined as 238.2 bp from treatment
INFO @ Sun, 22 Oct 2023 21:42:18: #1 note: mean fragment size in control is 210.8 bp -- value ignored
INFO @ Sun, 22 Oct 2023 21:42:18: #1 fragment size = 238.2
INFO @ Sun, 22 Oct 2023 21:42:18: #1 total fragments in treatment: 95298
INFO @ Sun, 22 Oct 2023 21:42:18: #1 user defined the maximum fragments...
INFO @ Sun, 22 Oct 2023 21:42:18: #1 filter out redundant fragments by allowing at most 1 identical fragment(s)
INFO @ Sun, 22 Oct 2023 21:42:18: #1 fragments after filtering in treatment: 95227
INFO @ Sun, 22 Oct 2023 21:42:18: #1 Redundant rate of treatment: 0.00
INFO @ Sun, 22 Oct 2023 21:42:18: #1 total fragments in control: 166484
INFO @ Sun, 22 Oct 2023 21:42:18: #1 user defined the maximum fragments...
INFO @ Sun, 22 Oct 2023 21:42:18: #1 filter out redundant fragments by allowing at most 1 identical fragment(s)
INFO @ Sun, 22 Oct 2023 21:42:18: #1 fragments after filtering in control: 166150
INFO @ Sun, 22 Oct 2023 21:42:18: #1 Redundant rate of control: 0.00
INFO @ Sun, 22 Oct 2023 21:42:18: #1 finished!
INFO @ Sun, 22 Oct 2023 21:42:18: #2 Build Peak Model...
INFO @ Sun, 22 Oct 2023 21:42:18: #2 Skipped...
INFO @ Sun, 22 Oct 2023 21:42:18: #3 Call peaks...
INFO @ Sun, 22 Oct 2023 21:42:18: #3 Pre-compute pvalue-qvalue table...
INFO @ Sun, 22 Oct 2023 21:42:19: #3 Call peaks for each chromosome...
INFO @ Sun, 22 Oct 2023 21:42:19: #4 Write output xls file... ../results/macs2/HIF1a_rep1_peaks.xls
INFO @ Sun, 22 Oct 2023 21:42:19: #4 Write peak in narrowPeak format file... ../results/macs2/HIF1a_rep1_peaks.narrowPeak
INFO @ Sun, 22 Oct 2023 21:42:19: #4 Write summits bed file... ../results/macs2/HIF1a_rep1_summits.bed
INFO @ Sun, 22 Oct 2023 21:42:19: Done!
Let’s take a look at the MACS output:
ls ../results/macs2
HIF1a_rep1_peaks.narrowPeak
HIF1a_rep1_peaks.xls
HIF1a_rep1_summits.bed
The main result of macs will be a list of peaks called as statistically significant. The .xls file can be opened and explored with excel. The .narrowPeak is a BED6+4 format file that contains the peak locations together with peak summit, p-value, and q-value. The _summits.bed file is in BED format, which contains the peak summits locations for every peak.
head ../results/macs2/HIF1a_rep1_peaks.narrowPeak
chr6 41734187 41734560 HIF1a_rep1_peak_1 181 . 5.15392 22.2925 18.1868 181
chr6 41787043 41787581 HIF1a_rep1_peak_2 160 . 5.07169 20.1635 16.0937 158
chr6 42452936 42453274 HIF1a_rep1_peak_3 194 . 4.60577 23.54 19.4193 214
chr6 42929347 42929669 HIF1a_rep1_peak_4 135 . 4.93615 17.5521 13.5385 155
chr6 43013743 43014397 HIF1a_rep1_peak_5 100 . 4.03138 13.9432 10.0151 477
chr6 43181975 43182279 HIF1a_rep1_peak_6 109 . 4.46358 14.922 10.9685 106
chr6 43667941 43668609 HIF1a_rep1_peak_7 4365 . 34.633 443.987 436.589 342
chr6 43725079 43725412 HIF1a_rep1_peak_8 102 . 4.01653 14.1664 10.2305 132
chr6 43768993 43770080 HIF1a_rep1_peak_9 968 . 13.7479 101.454 96.8855 238
chr6 43798896 43799239 HIF1a_rep1_peak_10 322 . 7.87035 36.5134 32.2785 185
We can create a BED file by ommiting some columns(this will be useful for the next step).
cut -f 1-6 ../results/macs2/HIF1a_rep1_peaks.narrowPeak > ../results/macs2/HIF1a_rep1_peaks.bed
Column 9 represents the -log10qvalue at peak summit. We can also take a look at the coordinates of the summits of the peaks ( regions of highest signal)
head ../results/macs2/HIF1a_rep1_summits.bed
chr6 41734368 41734369 HIF1a_rep1_peak_1 18.1868
chr6 41787201 41787202 HIF1a_rep1_peak_2 16.0937
chr6 42453150 42453151 HIF1a_rep1_peak_3 19.4193
chr6 42929502 42929503 HIF1a_rep1_peak_4 13.5385
chr6 43014220 43014221 HIF1a_rep1_peak_5 10.0151
chr6 43182081 43182082 HIF1a_rep1_peak_6 10.9685
chr6 43668283 43668284 HIF1a_rep1_peak_7 436.589
chr6 43725211 43725212 HIF1a_rep1_peak_8 10.2305
chr6 43769231 43769232 HIF1a_rep1_peak_9 96.8855
chr6 43799081 43799082 HIF1a_rep1_peak_10 32.2785
We can also check the number of significant peaks called
wc -l ../results/macs2/HIF1a_rep1_peaks.narrowPeak
22 ../results/macs2/HIF1a_rep1_peaks.narrowPeak
One of the most common problems in dealing with peaks in ChIP-seq analysis is to check if two sets of peaks overlap. To work through that, we are going to use bedtools intersect. You can find information about the tool in here.
For this exercise, we are going the overlap between peaks called by MACS2 for two replicates, one of them is the one we have processed so far, and the other one can be found in the data folder
ls ../data/HIF1a_Rep2/macs2/
HIF1a_rep2_peaks.bed
HIF1a_rep2_peaks.narrowPeak
HIF1a_rep2_peaks.xls
HIF1a_rep2_summits.bed
head ../data/HIF1a_Rep2/macs2/HIF1a_rep2_peaks.narrowPeak
chr6 40587363 40588002 HIF1a_rep2_peak_1 101 . 5.61209 14.3279 10.1769 102
chr6 41734211 41734546 HIF1a_rep2_peak_2 235 . 7.59926 28.0224 23.5614 167
chr6 41787067 41787523 HIF1a_rep2_peak_3 164 . 6.21755 20.807 16.4506 181
chr6 42453021 42453358 HIF1a_rep2_peak_4 155 . 6.51948 19.8864 15.5494 157
chr6 42890670 42891118 HIF1a_rep2_peak_5 52 . 4.54819 9.16283 5.21816 344
chr6 42929320 42929657 HIF1a_rep2_peak_6 89 . 5.23768 13.0709 8.96026 149
chr6 43013856 43014357 HIF1a_rep2_peak_7 102 . 5.19112 14.4096 10.257 127
chr6 43181955 43182229 HIF1a_rep2_peak_8 54 . 4.37834 9.4117 5.4488 123
chr6 43575833 43576214 HIF1a_rep2_peak_9 176 . 8.06929 22.0725 17.6879 211
chr6 43668045 43668540 HIF1a_rep2_peak_10 2533 . 21.5658 261.055 253.316 242
Let’s check the intersect between replicate 1 and replicate 2 with the default behaviour
bedtools intersect -a ../results/macs2/HIF1a_rep1_peaks.narrowPeak -b ../data/HIF1a_Rep2/macs2/HIF1a_rep2_peaks.narrowPeak
chr6 41734211 41734546 HIF1a_rep1_peak_1 181 . 5.15392 22.2925 18.1868 181
chr6 41787067 41787523 HIF1a_rep1_peak_2 160 . 5.07169 20.1635 16.0937 158
chr6 42453021 42453274 HIF1a_rep1_peak_3 194 . 4.60577 23.54 19.4193 214
chr6 42929347 42929657 HIF1a_rep1_peak_4 135 . 4.93615 17.5521 13.5385 155
chr6 43013856 43014357 HIF1a_rep1_peak_5 100 . 4.03138 13.9432 10.0151 477
chr6 43181975 43182229 HIF1a_rep1_peak_6 109 . 4.46358 14.922 10.9685 106
chr6 43668045 43668540 HIF1a_rep1_peak_7 4365 . 34.633 443.987 436.589 342
chr6 43725079 43725385 HIF1a_rep1_peak_8 102 . 4.01653 14.1664 10.2305 132
chr6 43769014 43770018 HIF1a_rep1_peak_9 968 . 13.7479 101.454 96.8855 238
chr6 43798896 43799222 HIF1a_rep1_peak_10 322 . 7.87035 36.5134 32.2785 185
chr6 43923520 43923979 HIF1a_rep1_peak_11 1104 . 15.5653 115.05 110.41 251
chr6 43990948 43991196 HIF1a_rep1_peak_12 139 . 5.26933 17.9401 13.9195 131
chr6 44010071 44010351 HIF1a_rep1_peak_14 1303 . 16.3591 135.086 130.349 1005
chr6 44010837 44011239 HIF1a_rep1_peak_14 1303 . 16.3591 135.086 130.349 1005
chr6 44058536 44058841 HIF1a_rep1_peak_15 180 . 5.83929 22.1368 18.0332 225
chr6 44073393 44073738 HIF1a_rep1_peak_16 2501 . 36.4275 255.43 250.181 264
chr6 44075173 44075528 HIF1a_rep1_peak_17 1175 . 16.5665 122.233 117.549 220
chr6 44076549 44077451 HIF1a_rep1_peak_18 2887 . 31.5892 294.173 288.734 590
chr6 44237388 44237938 HIF1a_rep1_peak_20 222 . 6.61987 26.4474 22.2977 180
chr6 46653087 46653380 HIF1a_rep1_peak_22 40 . 3.56876 7.73921 4.02043 110
By default, if an overlap is found, bedtools intersect reports the shared interval between the two overlapping features. We can also return not only the shared interval, but the original entries of A for the overlap:
bedtools intersect -a ../results/macs2/HIF1a_rep1_peaks.narrowPeak -b ../data/HIF1a_Rep2/macs2/HIF1a_rep2_peaks.narrowPeak -wa
chr6 41734187 41734560 HIF1a_rep1_peak_1 181 . 5.15392 22.2925 18.1868 181
chr6 41787043 41787581 HIF1a_rep1_peak_2 160 . 5.07169 20.1635 16.0937 158
chr6 42452936 42453274 HIF1a_rep1_peak_3 194 . 4.60577 23.54 19.4193 214
chr6 42929347 42929669 HIF1a_rep1_peak_4 135 . 4.93615 17.5521 13.5385 155
chr6 43013743 43014397 HIF1a_rep1_peak_5 100 . 4.03138 13.9432 10.0151 477
chr6 43181975 43182279 HIF1a_rep1_peak_6 109 . 4.46358 14.922 10.9685 106
chr6 43667941 43668609 HIF1a_rep1_peak_7 4365 . 34.633 443.987 436.589 342
chr6 43725079 43725412 HIF1a_rep1_peak_8 102 . 4.01653 14.1664 10.2305 132
chr6 43768993 43770080 HIF1a_rep1_peak_9 968 . 13.7479 101.454 96.8855 238
chr6 43798896 43799239 HIF1a_rep1_peak_10 322 . 7.87035 36.5134 32.2785 185
chr6 43923513 43923979 HIF1a_rep1_peak_11 1104 . 15.5653 115.05 110.41 251
chr6 43990916 43991196 HIF1a_rep1_peak_12 139 . 5.26933 17.9401 13.9195 131
chr6 44010031 44011443 HIF1a_rep1_peak_14 1303 . 16.3591 135.086 130.349 1005
chr6 44010031 44011443 HIF1a_rep1_peak_14 1303 . 16.3591 135.086 130.349 1005
chr6 44058536 44058996 HIF1a_rep1_peak_15 180 . 5.83929 22.1368 18.0332 225
chr6 44073262 44073853 HIF1a_rep1_peak_16 2501 . 36.4275 255.43 250.181 264
chr6 44075148 44075610 HIF1a_rep1_peak_17 1175 . 16.5665 122.233 117.549 220
chr6 44076130 44077532 HIF1a_rep1_peak_18 2887 . 31.5892 294.173 288.734 590
chr6 44237299 44237938 HIF1a_rep1_peak_20 222 . 6.61987 26.4474 22.2977 180
chr6 46653087 46653436 HIF1a_rep1_peak_22 40 . 3.56876 7.73921 4.02043 110
ls ../data/HIF2a_Rep1/HIF2a_Rep1.sorted.rmdup.bam
../data/HIF2a_Rep1/HIF2a_Rep1.sorted.rmdup.bam
You can use the same input control as for HIF1a.
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] digest_0.6.33 R6_2.5.1 fastmap_1.1.1 xfun_0.40
[5] cachem_1.0.8 knitr_1.43 htmltools_0.5.6 rmarkdown_2.24
[9] cli_3.6.1 sass_0.4.7 jquerylib_0.1.4 compiler_4.3.1
[13] tools_4.3.1 mime_0.12 evaluate_0.21 bslib_0.5.1
[17] yaml_2.3.7 rlang_1.1.1 jsonlite_1.8.7