Introduction

In gene expression analyses, one can test for differential expression of genes, or one can test for differential expression of individual isoforms (Baldoni et al 2024), or one can test for changes in the relative expression of different isoforms of the same gene (Baldoni et al 2025). In the RNA-seq context, the latter is called “differential transcript usage”. In the context of bottom-up mass spectrometry proteomics, one can test for differential isoform usage by testing for relative changes in the expression of different peptides for the same protein between experimental conditions. This vignette provides a simple example, using simulated data, illustrating how limpa and limma can be used to assess differential peptide usage between experimental conditions from bottom-up mass spectrometry data.

This vignette was made with limpa version 1.3.3.

Generating peptide-level data

To generate a simple toy example, we simulate a protein dataset with 250 proteins and 4 peptides per protein. Each row contains log-intensities for one peptide. The detection probability curve slope parameter has been set to 0.7, which is a typical value for real data, in order to generate non-ignorable missing values (Li & Smyth 2023).

> library(limpa)
> set.seed(20260107)
> y <- simProteinDataSet(n.peptides=1000,
+                        peptides.per.protein=4,
+                        n.groups=2,
+                        samples.per.group=5,
+                        dpc.slope=0.7)
> dim(y)
[1] 1000   10

Estimate missing peptide values

The first step in the analysis is to estimate the missing values in order to obtain complete data matrix. This is done with limpa’s dpcImpute function:

> z <- dpcImpute(y, dpc.slope=0.7)
DPC intercept estimated as -3.522
Estimating hyperparameters ...
Imputing peptides ...
Proteins: 1000 Peptides: 1000
Proteins: 1000 Peptides: 1000

The output object z is a limma EList including standard errors that record the uncertainty with which the missing values were estimated.

Differential peptide expression

Then we fit linear models to the peptide-level data, taking into account the standard errors from dpcImpute:

> z$genes$NPeptides <- NULL
> Group <- factor(z$targets$Group)
> design <- model.matrix(~Group)
> fit <- dpcDE(z, design, plot=FALSE)
> fit <- eBayes(fit)
> topTable(fit, coef=2)
               Protein DEStatus PropObs logFC AveExpr     t  P.Value adj.P.Val      B
Peptide0905 Protein227       Up     1.0  1.63    9.29  5.14 3.62e-05    0.0252 2.3186
Peptide0748 Protein187     Down     1.0 -1.68    8.00 -4.91 6.28e-05    0.0252 1.8256
Peptide0833 Protein209       Up     1.0  1.56    8.72  4.80 8.29e-05    0.0252 1.5750
Peptide0929 Protein233       Up     1.0  1.59    9.41  4.72 1.01e-04    0.0252 1.3993
Peptide0919 Protein230     Down     1.0 -1.51    9.48 -4.62 1.27e-04    0.0255 1.1876
Peptide0731 Protein183     Down     1.0 -1.28    7.83 -4.28 2.98e-04    0.0391 0.4234
Peptide0765 Protein192       Up     1.0  1.45    8.12  4.24 3.27e-04    0.0391 0.3387
Peptide0899 Protein225     Down     0.8 -1.23    9.17 -4.24 3.32e-04    0.0391 0.3165
Peptide0930 Protein233       Up     1.0  1.60    9.62  4.21 3.52e-04    0.0391 0.2711
Peptide0730 Protein183     Down     0.7 -1.03    8.02 -4.13 4.24e-04    0.0424 0.0731

We see that limpa has correctly identified peptides that are truly differentially expressed.

Differential peptide usage

The fitted model object can then be input to limma’s diffSplice() function to test for differential peptide usage:

> fit$genes$DEStatus <- NULL
> du <- diffSplice(fit, geneid="Protein")
Total number of exons:  1000 
Total number of genes:  250 
Number of genes with 1 exon:  0 
Mean number of exons in a gene:  4 
Max number of exons in a gene:  4 
> topSplice(du, coef=2)
              Protein NExons    F P.Value   FDR
Protein197 Protein197      4 5.99 0.00174 0.435
Protein179 Protein179      4 4.40 0.00896 0.821
Protein209 Protein209      4 4.22 0.01085 0.821
Protein167 Protein167      4 3.96 0.01429 0.821
Protein100 Protein100      4 3.71 0.01873 0.821
Protein187 Protein187      4 3.57 0.02193 0.821
Protein135 Protein135      4 3.53 0.02299 0.821
Protein212 Protein212      4 3.08 0.03783 1.000
Protein147 Protein147      4 2.87 0.04811 1.000
Protein203 Protein203      4 2.75 0.05488 1.000

The simulated data did not include any differentially used proteins, and the analysis correctly does not find any significant results, showing that the analysis has correctly controlled the error rate.

Further reading

Other example analyses using limpa are available from the https://github.com/SmythLab/limpa/.

Cited references

Baldoni PL#, Chen Y#, Hediyeh-zadeh S, Liao Y, Dong X, Ritchie ME, Shi W, Smyth GK (2024). Dividing out quantification uncertainty allows efficient assessment of differential transcript expression with edgeR. Nucleic Acids Research 52(3), e13. doi:10.1093/nar/gkad1167

Baldoni PL, Chen L, Li M, Chen Y, Smyth GK (2025). Dividing out quantification uncertainty enables assessment of differential transcript usage with limma and edgeR. Nucleic Acids Research 53(22), gkaf1305. doi:10.1093/nar/gkaf1305

Li M, Smyth GK (2023). Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics. Bioinformatics 39(5), btad200. doi:10.1093/bioinformatics/btad200

Li M, Cobbold SA, Smyth GK (2025). Quantification and differential analysis of mass spectrometry proteomics data with probabilistic recovery of information from missing values. bioRxiv 2025/651125. doi:10.1101/2025.04.28.651125