Broad-Enrich tests sets of broad genomic regions (e.g., from ChIP-seq data for histone modifications or copy number variations) for enriched biological pathways, Gene Ontology terms, or other gene sets. The pre-defined gene sets are the same as used in LRpath, and can be browsed here. Using an input .bed, .narrowPeak or.broadPeak file, Broad-Enrich determines the proportion of each gene locus covered by a peak, using a chosen "gene locus definition". The "locus" of a gene is the region from which the gene is predicted to be regulated. Broad-Enrich uses a logistic regression model to test for association between the proportion of each gene locus covered by a peak and gene set membership. It empirically adjusts for the bias due to locus length using a binomial cubic smoothing spline within the logistic model. Output includes summary plots, peak to gene assignments, and enrichment (and depletion) results including odds ratio, p-value, and FDR for each gene set.


ConceptGen Logo

ConceptGen is an open-source gene set enrichment testing and concept mapping tool. This web-based tool can be used both to identify biological gene sets (called concepts) enriched with differentially expressed genes (or any other user-identified gene list), and to explore networks of relationships among biological concepts from diverse biological sources. You can either upload a list of human Entrez Gene IDs or gene symbols with or without a background gene set, or query from among the pre-built concepts. You also have the option of using the compound converter tool to convert one or more compounds/metabolites to their associated enzymes for upload.

LR Path

LR Path Logo

LRpath performs gene set enrichment testing using logistic regression, and allows the data to remain on a continuous scale. This web-based tool tests against several annotation databases, including Gene Ontology, multiple pathway databases, metabolite, transcription factor and microRNA target sets, and literature-derived annotations. LRpath also includes clustering analysis functionality, allowing you to identify and compare biological concept signatures across multiple studies. LRpath performs well with both small and large sample sizes. Additional benefits of using the LRpath program include (1) the ability to perform both “directional” and “non-directional” enrichment tests that allow for two different perspectives and (2) the ability to easily compare and visualize results across multiple studies using LRpath clustering.


Metab2MeSH Logo

Find Medical Subject Headings (MeSH terms) enriched for a particular compound or compounds enriched for a particular MeSH term. Metab2MeSH uses a statistical approach to reliably and automatically annotate metabolites with the concepts defined in MeSH, the National Library of Medicine's controlled vocabulary for biomedical concepts. These annotations provide links from chemical substances to the biomedical research literature and complement existing resources including PubChem and the Human Metabolome Database.


MethylSig is our new R package for analyzing whole-genome bisulfite sequencing (bis-seq), reduced representation bisulfite sequencing (RRBS), or enhanced RRBS experiments. Methylsig tests for differentially methylated sites (DMCs) or regions (DMRs) using a beta-binomial model to account for the coverage and variation among samples at each CpG site or region, and has a well-calibrated Type 1 error rate. Several options exist for either site-specific or sliding window tests, combining strands, filtering sites, and for local variance estimation. In addition, methylSig offers numerous functions for annotating and visualizing results, and testing for enrichment of overlap with the binding sites of transcription factors.
Check out methylSig on GitHub

The easiest way to install methylSig is with the devtools R package:

> library(devtools) 
> install_github('sartorlab/methylSig')

methylSig vignette: methylSig.pdf

ENCODE tfbs annotation files:

hg18: ENCODE_YaleTfbs.hg18.txt.gz
hg19: ENCODE_AwgTfbs.hg19.txt.gz
mm9: ENCODE_Tfbs.mm9.txt.gz

PePr: Peak Prioritization Pipeline

PePr: Peak Prioritization Pipeline Logo

PePr is an analysis pipeline for ChIP-Seq experiments with biological replicates, written and available as python scripts. The program accounts for the variation among biological replicates and (optionally) peak locations relative to gene structure information using a mixture model. It can be used either to determine histone modifications or transcription factor binding versus control data, or for two group comparisons.