Title: | Capture Hi-C Analysis Engine |
---|---|
Description: | Toolkit for processing and calling interactions in capture Hi-C data. Converts BAM files into counts of reads linking restriction fragments, and identifies pairs of fragments that interact more than expected by chance. Significant interactions are identified by comparing the observed read count to the expected background rate from a count regression model. |
Authors: | Erle Holgersen [aut], Olivia Leavy [aut], Olivia Fletcher [aut], Frank Dudbridge [aut], Syed Haider [aut, cre] |
Maintainer: | Syed Haider <[email protected]> |
License: | GPL-2 |
Version: | 0.1.8 |
Built: | 2024-11-21 04:57:16 UTC |
Source: | https://github.com/cran/chicane |
Add model covariates (trans counts and distance) to an interactions data table.
add.covariates(interaction.data)
add.covariates(interaction.data)
interaction.data |
data.table with interaction data. Must contain columns
|
Updated data table with new columns
bait.trans.count |
number of trans interactions of bait fragment |
target.trans.count |
number of trans interactions of target fragment |
distance |
distance between bait and target fragment, or NA if trans |
Erle Holgersen <[email protected]>
data(bre80); input.cols <- c('bait.id', 'target.id', 'bait.chr', 'bait.start', 'bait.end', 'target.chr', 'target.start', 'target.end', 'count'); output <- add.covariates(bre80[, input.cols, with = FALSE]);
data(bre80); input.cols <- c('bait.id', 'target.id', 'bait.chr', 'bait.start', 'bait.end', 'target.chr', 'target.start', 'target.end', 'count'); output <- add.covariates(bre80[, input.cols, with = FALSE]);
Expand target and bait IDs of the form chrN:start-end to separate coordinate columns in the data table
add.fragment.coordinates(id.data)
add.fragment.coordinates(id.data)
id.data |
data table containing columns |
Data table with added coordinate columns for target and bait (as applicable).
Erle Holgersen <[email protected]>
data(bre80); add.fragment.coordinates(bre80[, .(bait.id, target.id)]);
data(bre80); add.fragment.coordinates(bre80[, .(bait.id, target.id)]);
Check if bedtools exists in PATH
bedtools.installed()
bedtools.installed()
Logical indicating if bedtools was found in PATH
Erle Holgersen <[email protected]>
bedtools.installed();
bedtools.installed();
A dataset containing processed data from a capture Hi-C experiment in the Bre80 normal epithelial breast tissue cell line. The experiment targeted several breast cancer risk loci, and reads that mapped to the 2q35 SNPs rs13387042 and rs16857609 are included in the dataset.
Data was prepared using the prepare.data
function. Coordinates are GRCh38.
data(bre80)
data(bre80)
A data table object with 47,766 rows and 13 columns.
The variables are as follows:
target.id String in chrN:start-end format identifying target fragment
bait.id String in chrN:start-end format identifying bait fragment
target.chr Chromosome of target fragment
target.start Start coordinate of target fragment (zero-based)
target.end End coordinate of target fragment
bait.chr Chromosome of bait fragment
bait.start Start coordinate of bait fragment (zero-based)
bait.end End coordinate of bait fragment
bait.to.bait Boolean indicating if the interaction is bait-to-bait (i.e. the fragment listed as target is also a bait)
bait.trans.count The number of reads linking the bait to fragments in trans (a measure of "interactibility")
target.trans.count The number of reads linking the target to fragments in trans (a measure of "interactibility")
distance Distance between the midpoints of the bait and target fragments (basepairs). NA for trans interactions
count The number of reads linking the two fragments
Baxter, Joseph S., et al. "Capture Hi-C identifies putative target genes at 33 breast cancer risk loci." Nature Communications 9.1 (2018): 1028.
Check if chicane model can be fit on a given dataset.
glm.nb
does not work when all responses are constant, or there are only two unique values and a covariate is a perfect predictor.
check.model.numerical.fit(interaction.data)
check.model.numerical.fit(interaction.data)
interaction.data |
Data table of interaction data on which model is to be fit |
boolean indicating if model can be fit
Helper function to check if the chicane model can be fit on each element of a split data list.
check.split.data.numerical.fit(split.data)
check.split.data.numerical.fit(split.data)
split.data |
List of data.table objects with fragment interaction data |
Logical indicating if the model can be fit
Run full method for detecting significant interactions in capture Hi-C experiments, starting
either from a BAM file or preprocessed data from prepare.data
chicane( bam = NULL, baits = NULL, fragments = NULL, interactions = NULL, replicate.merging.method = "sum", distribution = "negative-binomial", include.zeros = "none", bait.filters = c(0, 1), target.filters = c(0, 1), distance.bins = NULL, multiple.testing.correction = c("bait-level", "global"), adjustment.terms = NULL, remove.adjacent = FALSE, temp.directory = NULL, keep.files = FALSE, maxit = 100, epsilon = 1e-08, cores = 1, trace = FALSE, verbose = FALSE, interim.data.dir = NULL )
chicane( bam = NULL, baits = NULL, fragments = NULL, interactions = NULL, replicate.merging.method = "sum", distribution = "negative-binomial", include.zeros = "none", bait.filters = c(0, 1), target.filters = c(0, 1), distance.bins = NULL, multiple.testing.correction = c("bait-level", "global"), adjustment.terms = NULL, remove.adjacent = FALSE, temp.directory = NULL, keep.files = FALSE, maxit = 100, epsilon = 1e-08, cores = 1, trace = FALSE, verbose = FALSE, interim.data.dir = NULL )
bam |
Path to a BAM file |
baits |
Path to a BED file containing the baits |
fragments |
Path to a BED file containing all restriction fragments in the genome |
interactions |
Data table or path to a text file detailing fragment interactions, typically from |
replicate.merging.method |
Method that should be used for merging replicates, if applicable |
distribution |
Name of distribution of the counts. Options are 'negative-binomial', 'poisson', 'truncated-poisson', and 'truncated-negative-binomial' |
include.zeros |
String specifying what zero counts to include. Options are none (default), cis, and all. |
bait.filters |
Vector of length two, where the first element corresponds to the lower-end filter and the second to the upper-end filter. When global multiple testing correction is performed, altering the bait filtering settings may affect the number of significant results. |
target.filters |
Vector of length two, giving lower and higher filter, respectively. Changing this filtering setting may affect multiple testing correction by altering the number of tests performed. |
distance.bins |
Number of bins to split distance into. Models are fit separately in each bin. |
multiple.testing.correction |
String specifying how multiple testing correction should be performed, by bait or globally. |
adjustment.terms |
Character vector of extra terms to adjust for in the model fit. |
remove.adjacent |
Logical indicating whether to remove all reads mapping to adjacent restriction fragments. |
temp.directory |
Directory where temporary files should be stored. Defaults to current directory. |
keep.files |
Logical indicating whether to keep temporary files |
maxit |
Maximum number of IWLS iterations for fitting the model (passed to |
epsilon |
Positive convergence tolerance for Poisson and negative binomial models. Passed to |
cores |
Integer value specifying how many cores to use to fit model for cis-interactions. |
trace |
Logical indicating if output should be produced for each of model fitting procedure. Passed to |
verbose |
Logical indicating whether to print progress reports. |
interim.data.dir |
Path to directory to store intermediate QC data and plots. NULL indicate skip intermediate results. |
Data table with columns
target.id |
String in chrN:start-end format identifying target fragment |
bait.id |
String in chrN:start-end format identifying bait fragment |
target.chr |
Chromosome of target fragment |
target.start |
Start coordinate of target fragment (zero-based) |
target.end |
End coordinate of target fragment |
bait.chr |
Chromosome of bait fragment |
bait.start |
Start coordinate of bait fragment (zero-based) |
bait.end |
End coordinate of bait fragment |
bait.to.bait |
Boolean indicating if the interaction is bait-to-bait (i.e. the fragment listed as target is also a bait) |
bait.trans.count |
The number of reads linking the bait to fragments in trans (a measure of "interactibility") |
target.trans.count |
The number of reads linking the target to fragments in trans (a measure of "interactibility") |
distance |
Distance between the midpoints of the bait and target fragments (basepairs). NA for trans interactions |
count |
The number of reads linking the two fragments |
expected |
The expected number of reads linking the two fragments under the fitted model |
p.value |
P-value for test of the observed number of reads significantly exceeding the expected count |
q.value |
FDR-corrected p-value |
Erle Holgersen <[email protected]>
if( bedtools.installed() ) { # start from BAM file bam <- system.file('extdata', 'Bre80_2q35.bam', package = 'chicane'); baits <- system.file('extdata', '2q35.bed', package = 'chicane'); fragments <- system.file('extdata', 'GRCh38_HindIII_chr2.bed.gz', package = 'chicane'); results <- chicane( bam = bam, baits = baits, fragments = fragments ); } # start from pre-processed data data(bre80); results <- chicane(interactions = bre80);
if( bedtools.installed() ) { # start from BAM file bam <- system.file('extdata', 'Bre80_2q35.bam', package = 'chicane'); baits <- system.file('extdata', '2q35.bed', package = 'chicane'); fragments <- system.file('extdata', 'GRCh38_HindIII_chr2.bed.gz', package = 'chicane'); results <- chicane( bam = bam, baits = baits, fragments = fragments ); } # start from pre-processed data data(bre80); results <- chicane(interactions = bre80);
Merge biological replicates.
combine.replicates(replicates, method = c("sum", "weighted-sum"))
combine.replicates(replicates, method = c("sum", "weighted-sum"))
replicates |
list of data table objects from |
method |
string specifying the method for merging replicates. Options are 'sum' and 'weighted-sum'. |
The parameter method
determines which method is used for merging
replicates. Available options are weighted-sum and sum.
'weighted-sum' implements the size factor scaling approach used in DEseq, rounded to the closest integer. See Anders and Huber 2010 for details.
'sum' is the naive sum of counts across biological replicates.
Data table object containing merged data, where counts are stored in colums
count.i |
count of interaction in ith replicate |
count |
count after merging replicates |
Anders, Simon, and Wolfgang Huber. "Differential expression analysis for sequence count data." Genome biology 11.10 (2010): R106.
if( bedtools.installed() ) { # preprocess data bam <- system.file('extdata', 'Bre80_2q35.bam', package = 'chicane'); baits <- system.file('extdata', '2q35.bed', package = 'chicane'); fragments <- system.file('extdata', 'GRCh38_HindIII_chr2.bed.gz', package = 'chicane'); input.data <- prepare.data( bam = bam, baits = baits, fragments = fragments ); # combined two datasets into one merged <- combine.replicates(list(input.data, input.data)); }
if( bedtools.installed() ) { # preprocess data bam <- system.file('extdata', 'Bre80_2q35.bam', package = 'chicane'); baits <- system.file('extdata', '2q35.bed', package = 'chicane'); fragments <- system.file('extdata', 'GRCh38_HindIII_chr2.bed.gz', package = 'chicane'); input.data <- prepare.data( bam = bam, baits = baits, fragments = fragments ); # combined two datasets into one merged <- combine.replicates(list(input.data, input.data)); }
Compare replicates in a pairwise manner and further stratified by distance
compare.replicates(interaction.data = NULL, output.directory = "./")
compare.replicates(interaction.data = NULL, output.directory = "./")
interaction.data |
A named vector specifying paths to files created using {prepare.data()} |
output.directory |
Path to the output directory where pairwise plots are generated |
TRUE if pairwise plots were successfully created
Syed Haider
# TODO
# TODO
Convert a BAM file to a format that can be used for replicate merging.
Note: This function does not process data enough to be used for interaction calling. Use prepare.data
for full preprocessing.
convert.bam(bam, baits, fragments, temp.directory = NULL, keep.files = FALSE)
convert.bam(bam, baits, fragments, temp.directory = NULL, keep.files = FALSE)
bam |
Path to a BAM file |
baits |
Path to a BED file containing the baits |
fragments |
Path to a BED file containing all restriction fragments in the genome |
temp.directory |
Directory where temporary files should be stored. Defaults to current directory. |
keep.files |
Logical indicating whether to keep temporary files |
Erle Holgersen <[email protected]>
Convert a HiCUP digest file to BED format.
convert.hicup.digest.bed(hicup.digest, file.name = "")
convert.hicup.digest.bed(hicup.digest, file.name = "")
hicup.digest |
Path to HiCUP digest |
file.name |
Path to output file. A blank string indicates output to the console. |
hicup.digest <- system.file('extdata', 'HiCUP_digest_example.txt', package = 'chicane'); convert.hicup.digest.bed(hicup.digest);
hicup.digest <- system.file('extdata', 'HiCUP_digest_example.txt', package = 'chicane'); convert.hicup.digest.bed(hicup.digest);
Create a file in standard format for cross compatability including with WashU Epigenome Browser.
convert.standard.format(chicane.results, file.name = "")
convert.standard.format(chicane.results, file.name = "")
chicane.results |
Path to CHiCANE interaction calls file |
file.name |
Path to output file |
TRUE if output files are created successfully
Andrea Gillespie, Syed Haider
chicane.results <- system.file( 'extdata', 'T47D_2q35_filtered_chicane_calls.txt', package = 'chicane' ); output.file = file.path(tempdir(), 'temp_standard_format.txt'); convert.standard.format(chicane.results, file.name = output.file);
chicane.results <- system.file( 'extdata', 'T47D_2q35_filtered_chicane_calls.txt', package = 'chicane' ); output.file = file.path(tempdir(), 'temp_standard_format.txt'); convert.standard.format(chicane.results, file.name = output.file);
Convert zero-based region in format chr:start-end to 1-based
convert.to.one.based(id)
convert.to.one.based(id)
id |
string in format chr:start-end |
one-converted ID
Create a file compatible with WashU Epigenome Browser from CHiCANE interaction calls.
create.locus.plot( genome = "hg38", chr = NULL, start = NULL, end = NULL, gene.data = NULL, genomic.features = NULL, feature.name = NULL, fdr.filter = 0.05, interaction.data = NULL, file.name = NULL, height = 5.5, width = 8.5, track.heights = c(0.2, 0.5, 0.8, 0.5, 1.5, 2), ... )
create.locus.plot( genome = "hg38", chr = NULL, start = NULL, end = NULL, gene.data = NULL, genomic.features = NULL, feature.name = NULL, fdr.filter = 0.05, interaction.data = NULL, file.name = NULL, height = 5.5, width = 8.5, track.heights = c(0.2, 0.5, 0.8, 0.5, 1.5, 2), ... )
genome |
Name of genome build (e.g. 'hg38' or 'hg37') |
chr |
Chromosome number for desired locus including 'chr' (e.g. 'chr1') |
start |
Start coordinate of desired locus |
end |
End coordinate of desired locus |
gene.data |
Path to chosen genome annotation file in .gtf format |
genomic.features |
Path to BED file with coordinates of desired feature track |
feature.name |
Title to appear above genomic features |
fdr.filter |
Q-value filter threshold for interaction calls to be included |
interaction.data |
Path to unfiltered CHiCANE calls output |
file.name |
Path to output file |
height |
Height in inches for desired plot |
width |
Width in inches of desired plot |
track.heights |
Vector of length 6 indicating desired height of individual tracks |
... |
Any additional parameters to |
TRUE if plot was successfully created
Andrea Gillespie, Syed Haider
# In order to conserve memory only significant interactions are included in example # interaction.data file. However, in order to show raw counts, unfiltered calls should be # included and only significant interactions (as set by fdr.filter) wil be displayed gene.data <- system.file('extdata', 'gencode_2q35.gtf', package = 'chicane'); genomic.features <- system.file('extdata', '2q35.bed', package = 'chicane'); interaction.data <- system.file( 'extdata', 'T47D_2q35_filtered_chicane_calls.txt', package = 'chicane' ); file.name <- file.path(tempdir(), "chr2_interactions.pdf"); create.locus.plot( genome = 'hg38', chr = 'chr2', start = 216600000, end = 217200000, gene.data = gene.data, genomic.features = genomic.features, feature.name = 'baits', interaction.data = interaction.data, file.name = file.name, collapseTranscripts = TRUE, shape = "arrow" );
# In order to conserve memory only significant interactions are included in example # interaction.data file. However, in order to show raw counts, unfiltered calls should be # included and only significant interactions (as set by fdr.filter) wil be displayed gene.data <- system.file('extdata', 'gencode_2q35.gtf', package = 'chicane'); genomic.features <- system.file('extdata', '2q35.bed', package = 'chicane'); interaction.data <- system.file( 'extdata', 'T47D_2q35_filtered_chicane_calls.txt', package = 'chicane' ); file.name <- file.path(tempdir(), "chr2_interactions.pdf"); create.locus.plot( genome = 'hg38', chr = 'chr2', start = 216600000, end = 217200000, gene.data = gene.data, genomic.features = genomic.features, feature.name = 'baits', interaction.data = interaction.data, file.name = file.name, collapseTranscripts = TRUE, shape = "arrow" );
create a plot representing model's fit
create.modelfit.plot(model, file.name = NULL, resolution = 300)
create.modelfit.plot(model, file.name = NULL, resolution = 300)
model |
An object of fitted model |
file.name |
A string specifying plotting file name |
resolution |
A numeric specifying plot's resolution |
TRUE if plot was successfully created
Syed Haider
Assign distances to a meaningful category
distance.bin(distance)
distance.bin(distance)
distance |
Vector of distances that should be mapped to a distance bin |
vector of same length as distance
containing assigned distance bins
Split interaction data into subsets that are large enough for the chicane model to be fit (see Details), based on distance. This step allows the distance term in the model to be fit in a piecewise linear fashion.
distance.split( interaction.data, distance.bins = NULL, min.rows.bin = 50, verbose = FALSE )
distance.split( interaction.data, distance.bins = NULL, min.rows.bin = 50, verbose = FALSE )
interaction.data |
Data table of interaction data, typically from |
distance.bins |
Number of distance bins desired. If NULL, a number is chosen to ensure that the negative binomial can be fit in all bins. |
min.rows.bin |
The minimum number of expected rows in a distance bin. Ignored if distance.bins is set |
verbose |
Logical indicating whether to print progress reports |
Fitting glm.nb
fails when there is a lack of overdispersion in the data. The chicane method
contains logic to catch these errors and instead fit a Poisson model. However, to avoid this happening
more than necessary, an attempt is made to avoid distance splits that will clearly result in numerical errors.
This includes bins of data where the count is the same for all rows,
or a covariate is a perfect predictor of count.
List where each element corresponds to a specified distance bin, and the final one corresponding to trans-interactions (if present)
data(bre80); distance.split(bre80);
data(bre80); distance.split(bre80);
Add zero counts to interaction data
fill.in.zeros(interaction.data, baits, fragments) fill.in.zeroes(interaction.data, baits, fragments)
fill.in.zeros(interaction.data, baits, fragments) fill.in.zeroes(interaction.data, baits, fragments)
interaction.data |
Data table containing interaction data |
baits |
Vector of bait IDs used in the experiment, in format chrN:start-end |
fragments |
Vector of potential fragments the baits can link up to, in format chrN:start-end |
Data table containing origiina
data(bre80); bait.file <- system.file('extdata', '2q35.bed', package = 'chicane'); fragment.file <- system.file('extdata', 'GRCh38_HindIII_chr2.bed.gz', package = 'chicane'); results <- fill.in.zeros( bre80, baits = read.bed(bait.file), fragments = read.bed(fragment.file) );
data(bre80); bait.file <- system.file('extdata', '2q35.bed', package = 'chicane'); fragment.file <- system.file('extdata', 'GRCh38_HindIII_chr2.bed.gz', package = 'chicane'); results <- fill.in.zeros( bre80, baits = read.bed(bait.file), fragments = read.bed(fragment.file) );
Filter low and high-interacting restriction fragments based on the total number of trans counts
filter.fragments( interaction.data, bait.filters = c(0, 1), target.filters = c(0, 1), verbose = FALSE )
filter.fragments( interaction.data, bait.filters = c(0, 1), target.filters = c(0, 1), verbose = FALSE )
interaction.data |
Data table containing interactions |
bait.filters |
Vector of length two, where the first element corresponds to the lower-end filter and the second to the upper-end filter. When global multiple testing correction is performed, altering the bait filtering settings may affect the number of significant results. |
target.filters |
Vector of length two, giving lower and higher filter, respectively. Changing this filtering setting may affect multiple testing correction by altering the number of tests performed. |
verbose |
Logical indicating whether to print progress reports. |
Data table containing fragments that passed all filters
Erle Holgersen <[email protected]>
# filter out lowest 10% of baits filter.fragments(bre80, bait.filters = c(0.1, 1))
# filter out lowest 10% of baits filter.fragments(bre80, bait.filters = c(0.1, 1))
Fit GLM according to a specified distribution. This needs to be done separately from glm
in order to include negative binomial and truncated distributions as options.
fit.glm( formula, data, distribution = c("negative-binomial", "poisson", "truncated-poisson", "truncated-negative-binomial"), start = NULL, init.theta = NULL, maxit = 100, epsilon = 1e-08, trace = FALSE )
fit.glm( formula, data, distribution = c("negative-binomial", "poisson", "truncated-poisson", "truncated-negative-binomial"), start = NULL, init.theta = NULL, maxit = 100, epsilon = 1e-08, trace = FALSE )
formula |
Formula specifying model of interest |
data |
Data frame containing variables specified in formula |
distribution |
Name of distribution of the counts. Options are 'negative-binomial', 'poisson', 'truncated-poisson', and 'truncated-negative-binomial' |
start |
Starting values for model coefficients |
init.theta |
Initial value of theta if fitting the negative binomial distribution |
maxit |
Maximum number of IWLS iterations for fitting the model (passed to |
epsilon |
Positive convergence tolerance for Poisson and negative binomial models. Passed to |
trace |
Logical indicating if output should be produced for each of model fitting procedure. Passed to |
List with elements
model |
model object |
expected.values |
vector of expected values for each element in original data |
p.values |
vector of p-values for test of significantly higher response than expected |
Fit negative binomial model to obtain p-values for interactions.
fit.model( interaction.data, distance.bins = NULL, distribution = "negative-binomial", bait.filters = c(0, 1), target.filters = c(0, 1), adjustment.terms = NULL, maxit = 100, epsilon = 1e-08, cores = 1, trace = FALSE, verbose = FALSE, interim.data.dir = NULL )
fit.model( interaction.data, distance.bins = NULL, distribution = "negative-binomial", bait.filters = c(0, 1), target.filters = c(0, 1), adjustment.terms = NULL, maxit = 100, epsilon = 1e-08, cores = 1, trace = FALSE, verbose = FALSE, interim.data.dir = NULL )
interaction.data |
data.table object containing interaction counts. Must contain columns distance, count, and bait_trans_count. |
distance.bins |
Number of bins to split distance into. Models are fit separately in each bin. |
distribution |
Name of distribution of the counts. Options are 'negative-binomial', 'poisson', 'truncated-poisson', and 'truncated-negative-binomial' |
bait.filters |
Vector of length two, where the first element corresponds to the lower-end filter and the second to the upper-end filter. When global multiple testing correction is performed, altering the bait filtering settings may affect the number of significant results. |
target.filters |
Vector of length two, giving lower and higher filter, respectively. Changing this filtering setting may affect multiple testing correction by altering the number of tests performed. |
adjustment.terms |
Character vector of extra terms to adjust for in the model fit. |
maxit |
Maximum number of IWLS iterations for fitting the model (passed to |
epsilon |
Positive convergence tolerance for Poisson and negative binomial models. Passed to |
cores |
Integer value specifying how many cores to use to fit model for cis-interactions. |
trace |
Logical indicating if output should be produced for each of model fitting procedure. Passed to |
verbose |
Logical indicating whether to print progress reports. |
interim.data.dir |
Path to directory to store intermediate QC data and plots. |
Fit a negative binomial model for obtaining p-value for interactions. The data is first sorted by distance, and models are fit separately in each quantile of the distance-sorted data.
Interactions data with expected number of interactions and p-values added.
data(bre80); fit.model(bre80);
data(bre80); fit.model(bre80);
Calculate the number of possible combinations between baits and fragments, excluding self-ligations and only counting bait-to-bait interactions once (e.g. a-b, not b-a)
get.combination.count(baits, fragments, cis.only = FALSE)
get.combination.count(baits, fragments, cis.only = FALSE)
baits |
vector of bait IDs in form chrN:start-end |
fragments |
vector of fragment IDs in form chrN:start-end |
cis.only |
logical indicating whether cis-interactions only should be considered |
total number of possible combinations
Split a fragment in format chr:start-end to a list of corresponding elements
get.components(id)
get.components(id)
id |
string in format chr:start-end |
list with entries 'chr', 'start', 'end'
Calculate distance between bait and target region
get.distance(interaction.data)
get.distance(interaction.data)
interaction.data |
data.table with interaction data. Must contain columns bait.chr, bait.start, bait.end, target.chr, target.start, target.end |
vector of absolute distances (NA for trans-interactions)
data(bre80); input.cols <- c('bait.chr', 'bait.start', 'bait.end', 'target.chr', 'target.start', 'target.end'); get.distance( bre80[, input.cols, with = FALSE]);
data(bre80); input.cols <- c('bait.chr', 'bait.start', 'bait.end', 'target.chr', 'target.start', 'target.end'); get.distance( bre80[, input.cols, with = FALSE]);
Split a segment ID in form chrN:start-end
into its different components
get.id.components(id)
get.id.components(id)
id |
segment ID of form |
A character vector of length three, where the elements are chromosome, start, and end, respectively.
If id
is a vector, a list of the same length is returned
get.id.components('chrX:6-30'); get.id.components(c('3:4-10', '22:1000-20000'))
get.id.components('chrX:6-30'); get.id.components(c('3:4-10', '22:1000-20000'))
Generate a unique identifying ID for each interaction
get.interaction.id(bait, other.end, bait.to.bait, zero.based = FALSE)
get.interaction.id(bait, other.end, bait.to.bait, zero.based = FALSE)
bait |
id of bait in format chr:start-end |
other.end |
id of other end in format chr:start-end |
bait.to.bait |
logical indicating whether both ends are baits |
zero.based |
logical indicating if IDs are zero-based |
string identifying interaction
Calculate the number of trans-interactions per fragment, accounting for the fact that baits can be listed either as bait or target.
get.trans.counts(interaction.data)
get.trans.counts(interaction.data)
interaction.data |
Data table containing interactions |
Data table with columns fragment.id
and trans.count
.
fragment.id |
ID of restriction fragment in chrN:start-end format |
trans.count |
Number of trans interactions involving the fragment |
data(bre80); get.trans.counts(bre80[, .(bait.chr, target.chr, bait.id, target.id, count)]);
data(bre80); get.trans.counts(bre80[, .(bait.chr, target.chr, bait.id, target.id, count)]);
Check if a warning object is an iteration limit reached warning from glm.nb
is.glm.nb.maxiter.warning(w)
is.glm.nb.maxiter.warning(w)
w |
Warning object |
Logical indicating if warning matches iteration limit reached warning
Check if an error matches the error raised by glm.nb
due to an inflated theta estimate.
This happens when the variance of the negative binomial does not exceed the mean (i.e. there is no overdispersion).
In such cases, the Poisson distribution may be a suitable alternative.
is.glm.nb.theta.error(e)
is.glm.nb.theta.error(e)
e |
Error object |
Boolean indicating if error matches
Check if a warning matches the square root warning raised by glm.nb
due to an inflated theta estimate.
This happens when the variance of the negative binomial does not exceed the mean (i.e. there is no overdispersion).
In such cases, the Poisson distribution may be a suitable alternative.
is.glm.nb.theta.warning(w)
is.glm.nb.theta.warning(w)
w |
Warning object |
Boolean indicating if warning matches
Check that the model fit contains the same number of rows as the data used to fit it, and throw an error if not
model.rows.sanity.check(model.data, model)
model.rows.sanity.check(model.data, model)
model.data |
Data used to fit model |
model |
Resulting negative binomial model object |
None
Internal function for fitting model within a tryCatch loop, handling numerical errors gracefully.
model.try.catch( model.formula, data, distribution = "negative-binomial", maxit = 100, epsilon = 1e-08, init.theta = NULL, start = NULL, trace = FALSE, verbose = FALSE )
model.try.catch( model.formula, data, distribution = "negative-binomial", maxit = 100, epsilon = 1e-08, init.theta = NULL, start = NULL, trace = FALSE, verbose = FALSE )
model.formula |
formula |
data |
model data |
distribution |
Name of distribution of the counts. Options are 'negative-binomial', 'poisson', 'truncated-poisson', and 'truncated-negative-binomial' |
maxit |
Maximum number of IWLS iterations for fitting the model (passed to |
epsilon |
Positive convergence tolerance for Poisson and negative binomial models. Passed to |
init.theta |
Initial value of theta in negative binomial model |
start |
starting values of coefficients in linear predictor |
trace |
Logical indicating if output should be produced for each of model fitting procedure. Passed to |
verbose |
Logical indicating whether to print progress reports. |
List with elements
model |
model object. Set to NULL if no model could be fit. |
expected.values |
vector of expected values for each element in original data, or vector of NAs if no model could be fit |
p.values |
vector of p-values for test of significantly higher response than expected, or vector of NAs if no model could be fit |
Perform multiple testing correction on p-values from interaction test.
By default, multiple testing correction is applied per bait. To change this
to a global multiple testing correction, set bait.level = FALSE
.
multiple.testing.correct(interaction.data, bait.level = TRUE)
multiple.testing.correct(interaction.data, bait.level = TRUE)
interaction.data |
Data table of interaction calls. Must contain columns p.value and bait.id. |
bait.level |
Logical indicating whether multiple testing correction should be performed per bait. |
Original data table with new column
q.value |
FDR-corrected p-value |
## Not run: data(bre80); results <- fit.model(bre80); adjusted.results <- multiple.testing.correct(results); ## End(Not run)
## Not run: data(bre80); results <- fit.model(bre80); adjusted.results <- multiple.testing.correct(results); ## End(Not run)
Prepare data for running interaction calling. Takes a BAM file and baits and restriction fragments as input, and returns a data table with data ready for analysis.
prepare.data( bam, baits, fragments, replicate.merging.method = "sum", include.zeros = c("none", "cis", "all"), remove.adjacent = FALSE, temp.directory = NULL, keep.files = FALSE, verbose = FALSE )
prepare.data( bam, baits, fragments, replicate.merging.method = "sum", include.zeros = c("none", "cis", "all"), remove.adjacent = FALSE, temp.directory = NULL, keep.files = FALSE, verbose = FALSE )
bam |
Path to a BAM file |
baits |
Path to a BED file containing the baits |
fragments |
Path to a BED file containing all restriction fragments in the genome |
replicate.merging.method |
Method that should be used for merging replicates, if applicable |
include.zeros |
String specifying what zero counts to include. Options are none (default), cis, and all. |
remove.adjacent |
Logical indicating whether to remove all reads mapping to adjacent restriction fragments. |
temp.directory |
Directory where temporary files should be stored. Defaults to current directory. |
keep.files |
Logical indicating whether to keep temporary files |
verbose |
Logical indicating whether to print progress reports. |
Data table object with columns
target.id |
String in chrN:start-end format identifying target fragment |
bait.id |
String in chrN:start-end format identifying bait fragment |
target.chr |
Chromosome of target fragment |
target.start |
Start coordinate of target fragment (zero-based) |
target.end |
End coordinate of target fragment |
bait.chr |
Chromosome of bait fragment |
bait.start |
Start coordinate of bait fragment (zero-based) |
bait.end |
End coordinate of bait fragment |
bait.to.bait |
Boolean indicating if the interaction is bait-to-bait (i.e. the fragment listed as target is also a bait) |
count |
The number of reads linking the two fragments |
bait.trans.count |
The number of reads linking the bait to fragments in trans (a measure of "interactibility") |
target.trans.count |
The number of reads linking the target to fragments in trans (a measure of "interactibility") |
distance |
Distance between the midpoints of the bait and target fragments (basepairs). NA for trans interactions |
if( bedtools.installed() ) { bam <- system.file('extdata', 'Bre80_2q35.bam', package = 'chicane'); baits <- system.file('extdata', '2q35.bed', package = 'chicane'); fragments <- system.file('extdata', 'GRCh38_HindIII_chr2.bed.gz', package = 'chicane'); input.data <- prepare.data( bam = bam, baits = baits, fragments = fragments ); }
if( bedtools.installed() ) { bam <- system.file('extdata', 'Bre80_2q35.bam', package = 'chicane'); baits <- system.file('extdata', '2q35.bed', package = 'chicane'); fragments <- system.file('extdata', 'GRCh38_HindIII_chr2.bed.gz', package = 'chicane'); input.data <- prepare.data( bam = bam, baits = baits, fragments = fragments ); }
Read a BED file and return regions in chrN:start-end format
read.bed(bed.path, zero.based = TRUE)
read.bed(bed.path, zero.based = TRUE)
bed.path |
Path to bed file |
zero.based |
Whether to return ID in zero-based coordinates |
vector of region IDs
bait.file <- system.file('extdata', '2q35.bed', package = 'chicane'); baits <- read.bed(bait.file);
bait.file <- system.file('extdata', '2q35.bed', package = 'chicane'); baits <- read.bed(bait.file);
Run model fitting procedure for either bait-to-bait or other interactions. Meant for internal use only.
run.model.fitting( interaction.data, distance.bins = NULL, distribution = "negative-binomial", bait.to.bait = FALSE, adjustment.terms = NULL, maxit = 100, epsilon = 1e-08, cores = 1, trace = FALSE, verbose = FALSE, interim.data.dir = NULL )
run.model.fitting( interaction.data, distance.bins = NULL, distribution = "negative-binomial", bait.to.bait = FALSE, adjustment.terms = NULL, maxit = 100, epsilon = 1e-08, cores = 1, trace = FALSE, verbose = FALSE, interim.data.dir = NULL )
interaction.data |
data.table object containing interaction counts. Must contain columns distance, count, and bait_trans_count. |
distance.bins |
Number of bins to split distance into. Models are fit separately in each bin. |
distribution |
Name of distribution of the counts. Options are 'negative-binomial', 'poisson', 'truncated-poisson', and 'truncated-negative-binomial' |
bait.to.bait |
Logical indicating if model should be fit as bait-to-bait |
adjustment.terms |
Characted vector of extra terms to adjust for in the model fit |
maxit |
Maximum number of IWLS iterations for fitting the model (passed to |
epsilon |
Positive convergence tolerance for Poisson and negative binomial models. Passed to |
cores |
Integer value specifying how many cores to use to fit model for cis-interactions. |
trace |
Logical indicating if output should be produced for each of model fitting procedure. Passed to |
verbose |
Logical indicating whether to print progress reports. |
interim.data.dir |
Path to directory to store intermediate QC data and plots. |
Interactions data with expeceted number of interactions and p-values added.
Split a data frame into a prespecified number of bins, using
split
and cut
. Unlike the default R functions, this does not
fail when asked to split the data into a single bin.
smart.split(dat, bins)
smart.split(dat, bins)
dat |
Data frame or data table to be split |
bins |
Number of bins to split data into |
List with bins
elements. Each element corresponds to one portion
of the data
Generate a stratified sample matching distance distribution of significant interactions.
stratified.enrichment.sample(nonsignificant.results, significant.results)
stratified.enrichment.sample(nonsignificant.results, significant.results)
nonsignificant.results |
Data table containing non-significant interactions that should be sampled from |
significant.results |
Data table of significant results. Used to determine size of strata in stratified sampling procedure. |
test.enrichment
test.enrichment( interaction.data, feature.bed, significance.cutoff = 0.05, span = 0, n = 1000, remove.bait.to.bait = TRUE )
test.enrichment( interaction.data, feature.bed, significance.cutoff = 0.05, span = 0, n = 1000, remove.bait.to.bait = TRUE )
interaction.data |
Data table containing details on interactions |
feature.bed |
BED file with regions of features |
significance.cutoff |
q-value threshold for significant interactions |
span |
Distance around target restriction fragment to consider. If set to zero (default), only features that overlap with the restriction fragment itself are considered. |
n |
Number of random samples to consider |
remove.bait.to.bait |
Logical specifying whether to exclude bait-to-bait interactions |
list with elements
observed |
observed overlap between significant interactions and features |
random |
vector of length |
Erle Holgersen <[email protected]>
if( bedtools.installed() ) { data(bre80); ctcf.bed <- system.file('extdata', 'T47D_chr2_CTCF.bed.gz', package = 'chicane'); results <- chicane(interactions = bre80); test.enrichment(results, ctcf.bed, significance.cutoff = 0.25); }
if( bedtools.installed() ) { data(bre80); ctcf.bed <- system.file('extdata', 'T47D_chr2_CTCF.bed.gz', package = 'chicane'); results <- chicane(interactions = bre80); test.enrichment(results, ctcf.bed, significance.cutoff = 0.25); }
Verify that interaction.data object is in expected format. Throws an error if object does not fit requirements.
verify.interaction.data(interaction.data)
verify.interaction.data(interaction.data)
interaction.data |
Object to be verified. |
None