Software

ChIP-Enrich: Gene Set Enrichment Testing for ChIP-Seq Data

ChIP-Enrich and Poly-Enrich test ChIP-seq peak data for enrichment of biological pathways, Gene Ontology terms, and other types of gene sets. Using an input .BED file, ChIP-Enrich and Poly-Enrich assign peaks to genes based on a chosen "locus definition". The "locus" of a gene is the region from which the gene is predicted to be regulated. ChIP-Enrich uses a logistic regression model to test for association between the presence of at least one peak in a gene and gene set membership, while Poly-Enrich uses a negative binomial regression model to test the association between the number of peaks in a gene and gene set membership. They empirically adjust for the relationship between the length of the loci (and optionally mappability) and the outcome using a cubic smoothing spline term within the logistic model. Detailed methods are provided here. Output includes summary plots, peak to gene assignments,and enrichment (and depletion) results including odds ratio, p-value, and FDR for each gene set.
Empirically modeling the relationship between peak and locus length relaxes the explicit assumptions about the relationship underlying existing tests For example, Fishers exact test assumes that GO term membership is not related to the probability a gene will have at least one peak, which is satisfied when each gene has an equal probability of being assigned at least one peak. The binomial test (implemented in the GREAT software) assumes that the number of peaks assigned to a gene is proportional to its locus length and that no extra-variability over that expected for the binomial distribution exists in the dataset. Assumptions of both tests are violated in many ChIP-Seq datasets. less

Mint Pipeline

The mint pipeline analyzes single-end reads coming from sequencing assays measuring DNA methylation and hydroxymethylation. The pipeline analyzes reads from both bisulfite-converted assays such as WGBS and RRBS, and from pulldown assays such as MeDIP-seq, hMeDIP-seq, and hMeSeal. Moreover, with data measuring both 5-methylcytosine (5mc) and 5-hydroxymethylcytosine (5hmc), the mint pipeline integrates the two data types to classify genomic regions of 5mc, 5hmc, a mixture, or neither.
The pipeline is available as both a command line(https://github.com/sartorlab/mint) and a Galaxy graphical user interface too(https://github.com/sartorlab/mint_galaxy). Both implementations require minimal configuration while remaining flexible to experiment specific needs

Broad-Enrich

Broad-Enrich tests sets of broad genomic regions (e.g., from ChIP-seq data for histone modifications or copy number variations) for enriched biological pathways, Gene Ontology terms, or other gene sets. The pre-defined gene sets are the same as used in LRpath, and can be browsed here. Using an input .bed, .narrowPeak or.broadPeak file, Broad-Enrich determines the proportion of each gene locus covered by a peak, using a chosen "gene locus definition". The "locus" of a gene is the region from which the gene is predicted to be regulated. Broad-Enrich uses a logistic regression model to test for association between the proportion of each gene locus covered by a peak and gene set membership. It empirically adjusts for the bias due to locus length using a binomial cubic smoothing spline within the logistic model. Output includes summary plots, peak to gene assignments, and enrichment (and depletion) results including odds ratio, p-value, and FDR for each gene set.

PePr: Peak Prioritization Pipeline

PePr: Peak Prioritization Pipeline Logo

PePr is an analysis pipeline for ChIP-Seq experiments with biological replicates, written and available as python scripts. The program accounts for the variation among biological replicates and (optionally) peak locations relative to gene structure information using a mixture model. It can be used either to determine histone modifications or transcription factor binding versus control data, or for two group comparisons.

MethylSig

MethylSig is our new R package for analyzing whole-genome bisulfite sequencing (bis-seq), reduced representation bisulfite sequencing (RRBS), or enhanced RRBS experiments. Methylsig tests for differentially methylated sites (DMCs) or regions (DMRs) using a beta-binomial model to account for the coverage and variation among samples at each CpG site or region, and has a well-calibrated Type 1 error rate. Several options exist for either site-specific or sliding window tests, combining strands, filtering sites, and for local variance estimation. In addition, methylSig offers numerous functions for annotating and visualizing results, and testing for enrichment of overlap with the binding sites of transcription factors.
Check out methylSig on GitHub

The easiest way to install methylSig is with the devtools R package:

> library(devtools) 
> install_github('sartorlab/methylSig')


methylSig vignette: methylSig.pdf

ENCODE tfbs annotation files:

hg18: ENCODE_YaleTfbs.hg18.txt.gz
hg19: ENCODE_AwgTfbs.hg19.txt.gz
mm9: ENCODE_Tfbs.mm9.txt.gz

Metab2MeSH

Metab2MeSH Logo

Find Medical Subject Headings (MeSH terms) enriched for a particular compound or compounds enriched for a particular MeSH term. Metab2MeSH uses a statistical approach to reliably and automatically annotate metabolites with the concepts defined in MeSH, the National Library of Medicine's controlled vocabulary for biomedical concepts. These annotations provide links from chemical substances to the biomedical research literature and complement existing resources including PubChem and the Human Metabolome Database.

LR Path

LR Path Logo

LRpath performs gene set enrichment testing using logistic regression, and allows the data to remain on a continuous scale. This web-based tool tests against several annotation databases, including Gene Ontology, multiple pathway databases, metabolite, transcription factor and microRNA target sets, and literature-derived annotations. LRpath also includes clustering analysis functionality, allowing you to identify and compare biological concept signatures across multiple studies. LRpath performs well with both small and large sample sizes. Additional benefits of using the LRpath program include (1) the ability to perform both “directional” and “non-directional” enrichment tests that allow for two different perspectives and (2) the ability to easily compare and visualize results across multiple studies using LRpath clustering.