Clipper: False discovery rate control in differential gene expression analysis

Exaggerated false positives

The article "A large-sample crisis? Exaggerated false positives by popular differential expression methods" (https://www.biorxiv.org/content/10.1101/2021.08.25.457733v2), reports that DESeq2 and edgeR have unexpectedly high false discovery rates (FDRs). The article finds that DESeq2 and edgeR identify more differentially expressed genes (DEGs) after the condition labels of a dataset are randomly permuted.

Fig. I: Exaggerated false DEGs identified by DESeq2 and edgeR from anti-PD-1 therapy RNA-seq datasets.

A. Barplot showing the average numbers of DEGs identified from 668 permuted datasets. The error bars represent the standard deviations of 668 permutations. The red dots indicate the numbers of DEGs identified from the original dataset.

B. The distributions of the number of permuted datasets where a gene was mistakenly identified as a DEG. The percentages corresponding to the numbers are listed in parentheses below the numbers.

C. Barplot showing the average numbers of DEGs identified from both the original dataset and the permuted datasets. The error bars represent the standard deviations of 668 permutations. The red dots indicate the numbers of DEGs identified from the original dataset.

D. Percentage of permuted datasets where a DEG identified from the original dataset was also identified as a DEG. The genes are sorted by absolute log2(fold-change) in the original dataset in decreasing order. The absolute log2(fold-change) values corresponding to the ranks are listed in parentheses below the ranks. The line is fitted using the loess method, and the shaded areas represent 95% confidential intervals.

E. GO term enrichment for the DEGs identified from at least 10% permuted datasets. The top 5 enriched biological processes GO terms are shown. The analyses were performed using R package clusterProfiler. P.adjust represents the adjusted p-value using the Benjamini & Hochberg method.

F. Boxplots showing the poorness of fitting the negative binomial model to the genes identified by DESeq2 or edgeR as DEGs from any permuted datasets vs. all the other genes. The poorness of fit for each gene is defined as its negative log10(P-value) from the goodness-of-fit test for the negative binomial distributions estimated by DESeq2 or edgeR. The p-value in each panel was calculated by the Wilcoxon rank-sum test to compare the two groups of genes' poorness-of-fit values.

Enter Clipper

One way to control FDR in DEG analysis is by using Clipper, a general statistical framework for FDR control (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02506-9). Clipper controls FDR without relying on p-values or specific data distributions, which is useful because obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions. Clipper outperforms existing methods for a broad range of applications in high-throughput data analysis.

Fig. II: Comparison of Clipper and two popular DEG identification methods-edgeR and DESeq2-in DEG analysis on semi-synthetic bulk RNA-seq data

A. FDR control, power given the same target FDR, and power given the same actual FDR.

B. Ranking consistency of the true DEGs among the top 100 DEGs identified by each method. The consistency is defined between the genes' ranking based on edgeR/DESeq2's p-values or Clipper's contrast scores and their ranking based on true expression fold changes.

C. Reproducibility between two semi-synthetic datasets as technical replicates. Three reproducibility criteria are used: the IDR, the Pearson correlation, and the Spearman correlation. Each criterion is calculated for edgeR/DESeq2's p-values or Clipper's contrast scores on the two semi-synthetic datasets. Among the three methods, only Clipper controls the FDR, and Clipper achieves the highest power, the best gene ranking consistency, and the best reproducibility

What this app does

This app allows users to verify these findings by performing the same permutation analysis on their own RNA-seq datasets. In the DESeq2/edgeR vs. Clipper tab, the user will perform DESeq2/edgeR on an original dataset, a permuted dataset, and with the Clipper add-on. DESeq2 or edgeR will identify more DEGs on the permuted dataset than on the original dataset, suggesting that DESeq2 or edgeR has high FDR. DESeq2 or edgeR plus Clipper will identify fewer DEGs than DESeq or edgeR on the permuted dataset, suggesting that Clipper successfully controls FDR.

In the Clipper Vignette tab, users can get a better idea of how Clipper controls FDR by testing it on simulated datasets.

Install Clipper

Install DESeq2

Install edgeR

Overview

The purpose of this tab is to demonstrate how Clipper effectively controls FDR. There are three steps. First, the user will provide an RNA-seq dataset, and permute it. If the user has no RNA-seq dataset, they can choose to use one of two example datasets. Second, the user will perform either DESeq2 or edgeR on both the original and permuted datasets. Third, the user will perform Clipper analysis. We expect that DESeq2 or edgeR will identify an unusually high amount of DEGs on the permuted dataset (relative to the original dataset), and that Clipper will detect fewer DEGs than DESeq2 or edgeR on the permuted dataset.

The user may provide their own RNA-seq dataset, or use one of two example datasets provided in the app. In example 1, the number of DEGs detected on the original and permuted datasets is nearly the same, so Clipper detects no DEGs. In example 2, the number of DEGs detected on the original dataset is greater than the number of DEGs detected on the permuted dataset, so Clipper detects some DEGs.

Step 1. Obtain original and permuted datasets

Example datasets

Step 1 is to obtain an original and permuted RNA-seq dataset on which to perform DESeq2/edgeR and Clipper analysis. These datasets may be provided by the user, or the user may instead choose to use one of two example datasets, provided here:

Example 1

Download original Download permuted

Example 2

Download original Download permuted

To read the .rds files into R use the following code:

Manual permutation

If the user instead decides to provide their own original RNA-seq dataset, then they must perform permutation themselves. This can be done manually or in the app. To perform permutation manually, the user can use the following code:

In-app permutation

Alternatively, to perform permutation in the app, the user can upload their original dataset below, hit "Permute", and download a permuted dataset. (Note: if the original dataset is large, in-app permutation could take a long time, and manual permutation could be preferable.)

Upload unpermuted dataset here (.csv)

Browse...

Download

Step 2. Perform DESeq2/edgeR analysis on original and permuted datasets

Select method of DEG analysis, DESeq2 or edgeR:

DEG method

Use the following code to perform DESeq2/edgeR, first on original dataset and then the permuted dataset:

Upload the results of DESeq2/edgeR on original and permuted datasets:

DEGs from original dataset

Browse...

DEGs from permuted dataset

Browse...

Step 3. Perform Clipper analysis

This step can be performed in the app or manually. To perform Clipper analysis in the app, move on to step 4. Alternatively, use the following code to perform Clipper analysis manually, and upload the output here. Note that orig_DEGs and perm_DEGs are the qlf_i obtained on original and permuted datasets in step 2. Note also that in the following code, FDR is set to 0.05 and analysis is set to enrichment. These values are recommended for the 2 provided examples, but the user could decide to modify them.

Code for Clipper analysis

Upload DEGs from Clipper

Browse...

Compare DESeq2/edgeR and Clipper

We now compare DESeq2/edgeR with Clipper. To see a plot comparing DESeq2 or edgeR with Clipper, hit "Compare". If the user performed Clipper analysis manually and uploaded their results, hitting "Compare" will tell the app to use those results in the following plot. Otherwise, if the user decided to perform Clipper analysis in the app, hitting "Compare" will perform Clipper analysis in the app. A csv file containing the DEGs identified by Clipper, can be downloaded by hitting "Download" (this must happen after comparison).

Download

Analyzing high-throughput data with Clipper

Introduction

The most common goal of analyzing high-throughput data is to contrast two conditions to reliably screen "interesting features", where "interesting" means "enriched" or "differential". Enriched features are defined as those that have higher expected measurements under the experimental/treatment condition than the background condition, i.e., the negative control. The detection of such enriched features is called enrichment analysis. For example, common enrichment analyses include calling protein-binding sites in a genome from chromatin immunoprecipitation sequencing (ChIP-seq) data and identifying peptides from mass spectrometry (MS) data. In contrast, differential features are defined as those that have different expected measurements (without measurement errors) between two conditions, and the detection of such differential features is called differential analysis. For example, popular differential analyses include the identification of differentially expressed genes (DEGs) from genome-wide gene expression data (e.g., microarray and RNA sequencing (RNA-seq) data) and differentially chromosomal interaction regions (DIRs) from Hi-C data.

This package provides an easy implementation of a general statistical framework Clipper, an p-value-free FDR control method for analyzing high-throughput biological data. It is applicable to both enrichment analysis and differential analysis. This vignette explains the general use of `Clipper` and then demonstrates its use in typical high-throughput analyses including DEG analysis of RNA-seq data, peak calling from ChIP-Seq data, peptide identification from mass spectrometry data, and DIR analysis from Hi-C data. The details and examples of all these analyses will be further introduced below.

**Note**: if you use `Clipper` in published research, please cite: For questions regarding the use of `Clipper`, please post to https://github.com/JSB-UCLA/Clipper/issues.

General Pipeline

Here we show the most basic applications of Clipper in enrichment and differential analysis. `Clipper requires a minimum of four inputs:

- `score.exp` the measurements from the experimental (treatment) condition

- `score.back` the measurements from the background (control) condition

- `analysis` the type of analysis, "enrichment" or "differential".

- `FDR` the target FDR threshold(s), set to 0.05 by default

We use `Clipper` to find interesting features by contrasting two measurement matrices, one from the experimental condition and one from the background condition. The inputs of `score.exp` and `score.back` should be numeric matrices. The rows of `score.back` and `score.exp` should match and represent the same feature, and their columns represent replicates (biological samples). For enrichment and differential analysis, set `analysis` to be `"enrichment"("e")` or `"differential"("d")` respectively.

We demonstrate the general pipeline of `Clipper` using the following simulated datasets (provided in the Clipper package): `exp_e` and `back_e` in example of enrichment analysis; and `exp_d` and `back_d` in example of differential analysis.

Clipper vignette: Website

Enrichment analysis

For enrichment analysis, we use the simulated data `exp_e` and `back_e` as inputs of `score.back` and `score.exp`. In this dataset, there are 10,000 features and the first 1000 features are interesting. For the target FDR threshold(s), we can then use `Clipper` to perform enrichment analysis:

`Clipper` returns a list and its component `discovery` is a list of indices of identified interesting discoveries corresponding to the FDR threshold(s):

We can then calculate the resulting false discovery proportion (FDP) and power:

Differential analysis

In a differential analysis, the two conditions can be treated equally thus you can assign either condition as background/experimental. We use the simulated data `exp_d` and `back_d` as inputs of `score.back` and `score.exp`. In this dataset, there are 10,000 features and the first 2000 features are interesting. For the target FDR threshold(s), we can then use `Clipper` to perform differential analysis:

`Clipper` still returns a list and its component `discovery` is a list of indices of identified interesting discoveries corresponding to the FDR threshold(s):

We can then calculate the resulting false discovery proportion (FDP) and power:

DEG identification

Here we demonstrate how to use `Clipper` for DEG identification. set `analysis` to be `"differential"`. The input of `score.exp` and `score.back` should be gene expression matrices with genes as rows and replicates as columns. It is important to ensure the rows of `score.exp` and `score.back` represent the same set of genes in the same order. `Clipper` requires either `score.exp` or `score.back` to have at least two replicates to perform a differential analysis.

Users are recommended to normalize and log-tranform read count matrices before inputing them to `Clipper` (see examples below). The following example uses `edgeR` to preprocess raw read count matrices as an example. Users can also use their preferred normalization methods.

Next we apply `Clipper` to preprocessed gene expression matrices. Log-transformation is recommended because empirical evidence indicates that log-transformed read counts are more likely to satisfy Clipper's assumption for FDR control than read counts without log-transformation.

DIR identification from Hi-C data

To demonstrate the use of `Clipper` to identify differential chromosomal interaction regions, we start with interaction matrices obtained from Hi-C data. Because Hi-C interaction matrices are often stored as sparse matrices, we need to format these matrices into the right form. Suppose we have two interaction matrices under the experimental condition and the another two under the background condition.

`exp1` is a interaction matrix of resolution $10^6$ base pairs. Each row represents the interaction intensity between one pair of regions: the first integer indicates the starting point of a region, the second integer indicates the starting point of another region, and the third numeric indicates the read counts between these two regions. Only pairs of regions with a non-zero read count are explicitly coded in this matrix; all other pairs have zero read counts.

`Clipper` treats pairs of chromosome regions as features to call DIRs. Therefore, it is important to make sure to match features when we generate input to `score.exp` and `score.back`.

If not matched, users can use the following code to extract shared features and match them.

Next we apply `Clipper` by collecting replicates under the experimental condition and the background condition.

Peak Calling

Here we demonstrate how to apply `Clipper` as an add-on to MACS, a popular peak calling method, for a better FDR control. Similar workflows also apply if users prefer other peak calling method such as HOMER.

Extract input data from existing peak calling software MACS

To implement `Clipper` for peak calling, users need to extract two outputs from MACS: a "$*$_peaks.narrowPeak" file and two "$*$_pileup.dbg" files, one for the experimental track and the other for the control track. The "$*$_peaks.narrowPeak" file contains candidate peaks identified by MACS with its second and third rows indicating the start and end points of the peaks. The "$*$_pileup.dbg" files contain the base-pair read coverages for genomic regions of interest. See the following code chunk for a glance of these files. See the end of this section for command line codes we used to generate the "$*$_peaks.narrowPeak" file and the two "$*$_pileup.dbg" files. In the following example, we use a synthetic ChIP-seq sequence as the experiemntal sample and a real control sample as background sample. The "$*$_pileup.dbg" files for these two samples are: `'experimental_treat_pileup.bdg'` and `'control_treat_pileup.bdg'`. The peaks called by MACS by contrasting these two samples are in `'twosample_peaks.narrowPeak'`.

Use Clipper to screen candidate peaks

We use the following codes to generate the input of `score.exp` and `score.back` from the two "$*$_pileup.dbg" files. The resulting vectors `s1` and `s2` contains the read coverages for each base pair. Then we supply `s1` to `score.exp`, `s2` to `score.back`, and use `Clipper` to perform enrichment analysis to obtain a threshold on per-base-pair contrast scores.

We then use this threshold to screen the candidate peaks identified by MACS. We compare the median of per-base-pair contrast scores within each candidate peak with the threshold output by `Clipper`. Only peaks whose median per-base-pair contrast score is greater or equal to the threshold are identified as peaks. We don't directly use the discoveries output by `Clipper` becasuse they are base-pair features and do not form biologically meaningful peaks. Instead, we use better-defined peak regions by MACS.

Generation of MACS output files

We start from two files applicable for MACS peak calling, such as two bam files: `experimental.bam` for experimental condition and `control.bam` for background/negative control condition. We use the following shell scripts to obtain the "$*$_peaks.narrowPeak" file and two "$*$_pileup.dbg" files. See links for tutorials of MACS.

Peptide identification as an add-on of a database search algorithm

The enrichment functionality of `Clipper` also allows it to control the FDR of peptide identification as an add-on to database search engines. Here we demonstrate this function using an example of Mascot, a popular database search method for peptide identification. Similar workflows also apply if users prefer other database search methods such as SEQUEST, Byonic and etc.

First we load a peptide-spectrum match (PSM) level sample output from Mascot with the target-decoy search strategy:

To apply `Clipper` to control FDR of identified PSMs, we treat spectra as features; given a spetrum, we treat its q-value from the target search as a measurement under the experimental condition and its q-value from the decoy search as a measurement under the background condition. Prior to applying Clipper, we use the following code to format search results:

Our goal is to identify spectra whose q-value from the target search is smaller than that from the decoy search; thus we set `analysis` to be `"enrichment"`. Because a smaller q-value is considered better, we feed $-$log-transformed q-values into `Clipper`.

Try it yourself!

In the previous examples, Clipper was demonstrated on various simulated data sets. In this tab, you can try Clipper out on experimental and background measurements you provide. The experimental and background measurements should be formatted similarly to exp_e and back_e from the Clipper package. Moreover, you can provide the true positives, and obtain the FDR and power of Clipper. The true positives should be formatted as a csv with the indices of the true positives in a column.

Settings

Experimental condition

Browse...

Background condition

Browse...

True positives

Browse...

Analysis type

FDR (a number between 0 and 1)

Get interesting discoveries

Get Interesting Discoveries

Histogram of contrast score values (true positives not needed)

FDR and power of Clipper (true positives needed)

False Discovery Rate:

Power:

The Clipper team

Authors

Prof. Jingyi Jessica Li, Department of Statistics, UCLA, Los Angeles CA: Website

Xinzhou Ge, Department of Statistics, UCLA, Los Angeles CA: Website

Implementation of shiny app

Gilbert Neuner, undergraduate student researcher at UCLA