In gene expression analyses, one can test for differential expression of genes, or one can test for differential expression of individual isoforms (Baldoni et al 2024), or one can test for changes in the relative expression of different isoforms of the same gene (Baldoni et al 2025). In the RNA-seq context, the latter is called “differential transcript usage”. In the context of bottom-up mass spectrometry proteomics, one can test for differential isoform usage by testing for relative changes in the expression of different peptides for the same protein between experimental conditions. This vignette provides a simple example, using simulated data, illustrating how limpa and limma can be used to assess differential peptide usage between experimental conditions from bottom-up mass spectrometry data.
This vignette was made with limpa version 1.3.3.
To generate a simple toy example, we simulate a protein dataset with 250 proteins and 4 peptides per protein. Each row contains log-intensities for one peptide. The detection probability curve slope parameter has been set to 0.7, which is a typical value for real data, in order to generate non-ignorable missing values (Li & Smyth 2023).
> library(limpa)
> set.seed(20260107)
> y <- simProteinDataSet(n.peptides=1000,
+ peptides.per.protein=4,
+ n.groups=2,
+ samples.per.group=5,
+ dpc.slope=0.7)
> dim(y)
[1] 1000 10
The first step in the analysis is to estimate the missing values in order to obtain complete data matrix.
This is done with limpa’s dpcImpute function:
> z <- dpcImpute(y, dpc.slope=0.7)
DPC intercept estimated as -3.522
Estimating hyperparameters ...
Imputing peptides ...
Proteins: 1000 Peptides: 1000
Proteins: 1000 Peptides: 1000
The output object z is a limma EList including standard errors that record the uncertainty with which the missing values were estimated.
Then we fit linear models to the peptide-level data, taking into account the standard errors from dpcImpute:
> z$genes$NPeptides <- NULL
> Group <- factor(z$targets$Group)
> design <- model.matrix(~Group)
> fit <- dpcDE(z, design, plot=FALSE)
> fit <- eBayes(fit)
> topTable(fit, coef=2)
Protein DEStatus PropObs logFC AveExpr t P.Value adj.P.Val B
Peptide0905 Protein227 Up 1.0 1.63 9.29 5.14 3.62e-05 0.0252 2.3186
Peptide0748 Protein187 Down 1.0 -1.68 8.00 -4.91 6.28e-05 0.0252 1.8256
Peptide0833 Protein209 Up 1.0 1.56 8.72 4.80 8.29e-05 0.0252 1.5750
Peptide0929 Protein233 Up 1.0 1.59 9.41 4.72 1.01e-04 0.0252 1.3993
Peptide0919 Protein230 Down 1.0 -1.51 9.48 -4.62 1.27e-04 0.0255 1.1876
Peptide0731 Protein183 Down 1.0 -1.28 7.83 -4.28 2.98e-04 0.0391 0.4234
Peptide0765 Protein192 Up 1.0 1.45 8.12 4.24 3.27e-04 0.0391 0.3387
Peptide0899 Protein225 Down 0.8 -1.23 9.17 -4.24 3.32e-04 0.0391 0.3165
Peptide0930 Protein233 Up 1.0 1.60 9.62 4.21 3.52e-04 0.0391 0.2711
Peptide0730 Protein183 Down 0.7 -1.03 8.02 -4.13 4.24e-04 0.0424 0.0731
We see that limpa has correctly identified peptides that are truly differentially expressed.
The fitted model object can then be input to limma’s diffSplice() function to test for differential peptide usage:
> fit$genes$DEStatus <- NULL
> du <- diffSplice(fit, geneid="Protein")
Total number of exons: 1000
Total number of genes: 250
Number of genes with 1 exon: 0
Mean number of exons in a gene: 4
Max number of exons in a gene: 4
> topSplice(du, coef=2)
Protein NExons F P.Value FDR
Protein197 Protein197 4 5.99 0.00174 0.435
Protein179 Protein179 4 4.40 0.00896 0.821
Protein209 Protein209 4 4.22 0.01085 0.821
Protein167 Protein167 4 3.96 0.01429 0.821
Protein100 Protein100 4 3.71 0.01873 0.821
Protein187 Protein187 4 3.57 0.02193 0.821
Protein135 Protein135 4 3.53 0.02299 0.821
Protein212 Protein212 4 3.08 0.03783 1.000
Protein147 Protein147 4 2.87 0.04811 1.000
Protein203 Protein203 4 2.75 0.05488 1.000
The simulated data did not include any differentially used proteins, and the analysis correctly does not find any significant results, showing that the analysis has correctly controlled the error rate.
Other example analyses using limpa are available from the https://github.com/SmythLab/limpa/.
Baldoni PL#, Chen Y#, Hediyeh-zadeh S, Liao Y, Dong X, Ritchie ME, Shi W, Smyth GK (2024). Dividing out quantification uncertainty allows efficient assessment of differential transcript expression with edgeR. Nucleic Acids Research 52(3), e13. doi:10.1093/nar/gkad1167
Baldoni PL, Chen L, Li M, Chen Y, Smyth GK (2025). Dividing out quantification uncertainty enables assessment of differential transcript usage with limma and edgeR. Nucleic Acids Research 53(22), gkaf1305. doi:10.1093/nar/gkaf1305
Li M, Smyth GK (2023). Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics. Bioinformatics 39(5), btad200. doi:10.1093/bioinformatics/btad200
Li M, Cobbold SA, Smyth GK (2025). Quantification and differential analysis of mass spectrometry proteomics data with probabilistic recovery of information from missing values. bioRxiv 2025/651125. doi:10.1101/2025.04.28.651125