Contents

1 Introduction

Renal cell carcinoma (kidney cancer) is one of the 10 most common diagnosed cancers worldwide. Clear cell renal cell carcinoma (ccRCC) is the most prevalent histological subtype, representing 75% of all case and the majority of deaths. The Clinical Proteomics Tumor Analysis Consortium (CPTAC) performed proteome analysis of 103 treatment naive ccRCC tumors and 84 paired normal adjacent tissues (NATs) (Clark et al 2019).

The DIA whole proteome data from Clark et al (2019) has been deposited to the MassIVE proteome database as dataset MSV000093041 (https://doi.org/doi:10.25345/C52V2CM61). The data was reanalysed by Kohler et al (2024) and, in particular, the raw data was reprocessed using FragPipe and MSFragger-DIA, with extraction of quantification using DIA-NN (Yu et al 2023). The files from the reanalysis are available as MassIVE Reanalysis RMSV000000696.1. The quantification files, including the DIA-NN report.tsv file, are available via ftp from ftp://massive-ftp.ucsd.edu/x01/RMSV000000696/2023-12-15_devonjkohler_b88e6fe3/quant.

The data includes samples from 106 individuals (or cases). There are 81 cases with matched tumor and normal adjacent tissue samples, and also 22 cases with tumor only and 3 cases with normal tissue only.

Here, we analyse the data using the limpa package. This vignette was made with limpa version 1.3.4 and limma version 3.67.0.

2 Read in DIA-NN data

The first step is the read the DIA-NN report file containing the precursor quantifications into a limma EList object:

> library(limpa)
> x <- readDIANN("report.tsv", q.columns=c(),
+      extra.columns=c("Protein.Group","Genes"))

The data includes 156,397 precursors and 187 samples:

> dim(x)
[1] 156397    187

The data includes UniProt protein IDs and gene names:

> head(x$genes)
                                      Protein.Group  Genes
(UniMod:1)AAAAAAAAAAGAAGGR1                  Q86U42 PABPN1
(UniMod:1)AAAAAAAAAAGAAGGR2                  Q86U42 PABPN1
(UniMod:1)AAAAAAAAAAGAAGGRGSGPGR2            Q86U42 PABPN1
(UniMod:1)AAAAAAAAEQQSSNGPVK2                Q16585   SGCB
(UniMod:1)AAAAAAAAEQQSSNGPVK3                Q16585   SGCB
(UniMod:1)AAAAAAAGDSDSWDADAFSVEDPVRK2        O75822  EIF3J

There are 9019 proteins in total:

> length(unique(x$genes$Protein.Group))
[1] 9019

3 Basic sample annotation

We can extract the case IDs and tissue origin from the sample names. Note that row.names(x) gives the precursor IDs and colnames(x) gives the sample names.

> cn.s <- strsplit2(colnames(x), split="_")
> Case <- factor(cn.s[,7])
> Tissue <- factor(substring(cn.s[,8],1,1))

There are 103 tumor samples and 84 normal adjacent tissue samples:

> table(Tissue)
Tissue
  N   T
 84 103 

A closer look using table(Tissue,Case) shows that there are 81 cases with matched tumor and normal adjacent tissue samples from the same case, while 22 cases have tumor only and 3 cases have normal tissue only.

For convenient and brevity, we reset the sample names to consist just of the case and tissue:

> colnames(x) <- paste(Case,Tissue,sep="_")

4 Pattern of missing values

Over 48% of the quantifications are NA:

> mean(is.na(x$E))
[1] 0.482

The rate of missing values is much higher in the tumor samples:

> plot(colMeans(is.na(x$E)) ~ Tissue)

but the average log-intensity of peptides that are detected is slightly higher in the tumor samples

> plot(colMeans(x$E, na.rm=TRUE) ~ Tissue)

This suggests that about 5-10% of proteins that are expressed in normal tissue may be down-regulated or absent in the tumor samples. This sort of global imbalance in expression levels poses questions for how the log-intensities should be normalized in downstream analyses. If we compare tumor to normal tissue without normalization, we should anticipate unbalanced DE with far more down-regulatd than up-regulated proteins. The difference in missing value rate between tumor and normal samples may reflect a true biological difference between the tissues, and so should not necessarily be corrected for by normalization.

5 Estimating the DPC

For DIA-NN data, it would be reasonable to assume a DPC slope of about 0.7, but we will here estimate the DPC from the data itself:

> dpcest <- dpcCN(x)
Iter 1: dpc = -13.893  0.806
Iter 2: dpc = -14.897  0.871
> plotDPC(dpcest)

The DPC slope is quite steep for this data, corresponding to a steep transition from nearly all missing at around log2-intensity 15 to nearly all detected around 20.

6 Quantifying proteins

Next we estimate the protein intensities from the precursor data. For completeness, we will retain all precursors in the quantification step, even though the percentage of missing values is very high for some of the precursors. We also retain all the proteins, including those with just one precursor.

One could consider filtering out precursors or proteins that are almost all NA in order to speed up the quantification step. Ideally though, we choose to retain all the precursors, because even precursors with just one or two observed values can potentially add some information to the final protein values and standard errors. The limpa quantification step accumulates information from multiple precursors rather than averaging, so including some low-information precursors will not detract from the accuracy of the final estimates.

> y <- dpcQuant(x, dpc=dpcest)

The protein-level EList has 9019 rows, one for each protein:

> dim(y)
[1] 9019  187

7 Filtering and normalization

At this stage we will filter out some proteins not of interest in the downstream DE analysis.

We remove the Biognosys protein that was spiked-in for mass spectrometry calibration:

> Biognosys <- grep("Biognosys|iRT-Kit_WR_fusion", row.names(y))

We also remove the immunoglobin heavy chain variable regions, because they will be sample-specific rather than tissue-specific:

> IGHV <- grep("^IGHV", y$genes$Genes)
> IGKV <- grep("^IGKV", y$genes$Genes)
> IGLV <- grep("^IGLV", y$genes$Genes)
> yfilt <- y
> yfilt <- yfilt[-c(Biognosys,IGHV,IGKV,IGLV),]
> dim(yfilt)
[1] 8921  187

We will also filter out proteins that were detected in fewer than 40 samples:

> NSamplesDetectedIn <- rowSums(yfilt$other$n.observations > 0)
> yfilt <- yfilt[NSamplesDetectedIn >= 40,]
> dim(yfilt)
[1] 7980  187

This leaves 7980 proteins for downstream analysis.

It is difficult to decide whether to perform additional between-sample normalization at this stage. We will refrain from any extra normalization, for the reasons given above.

8 Data exploration and QC

An MDS plot shows that the normal and tumor tissue samples are very well separated, with one exceptional case:

> Color <- Tissue
> levels(Color) <- c("blue","red")
> mds <- plotMDSUsingSEs(yfilt, pch=16, col=as.character(Color),
+   main="MDS plot of sample relationships")
> legend("bottomleft", legend=c("Normal","Tumor"), pch=16, col=levels(Color))

Looking closely at the MDS plot, we can see one tumor sample on the left amongst the normal samples and one normal sample on the right amongst the tumor samples. We can identify these samples using the x-axis from the MDS plot:

> colnames(yfilt)[mds$x < -3 & Tissue=="T"]
[1] "C3L-00360.T"
> colnames(yfilt)[mds$x >  3 & Tissue=="N"]
[1] "C3L-00360.N"

We see that the outlier samples are both from case C3L-00360, and it would seem that the tissue labels for this case have been swapped. This swap is presumably a processing error, because no such outlier were observed in Figure 4A of Clark et al (2019), which was made from TMT-based proteomic data for the same samples. We correct this apparent error by reversing the tumor and tissue labels for case C3L-00360:

> Tissue[mds$x < -3 & Tissue=="T"] <- "N"
> Tissue[mds$x >  3 & Tissue=="N"] <- "T"
> colnames(yfilt) <- paste(Case,Tissue,sep="_")

9 Tumor vs adjacent normal differential expression

We now evaluate differentially expressed proteins in the tumor samples compared to the matched normals. Because of the unmatched samples, we treat the cases as random effects and estimate the intra-case correlation by setting the block argument of dpcDE to Case. We also allow for sample-specific quality weights by setting sample.weights=TRUE:

> design <- model.matrix(~Tissue)
> fit <- dpcDE(yfilt, design, block=Case, sample.weights=TRUE, plot=TRUE)
First sample weights (min/max) 0.242/2.413
First intra-block correlation  0.178

Final sample weights (min/max) 0.248/2.412
Final intra-block correlation  0.179

The vooma plot shows that limpa is able to predict a variance trend using the quantification standard errors. As expected, samples from the same case are positively correlated. Tumor samples are generally given lower sample weights than the normal tissue samples, reflecting the greater heterogeneity of the tumor samples:

>  plot(fit$targets$sample.weight ~ Tissue)

We go on to evaluate differential expression. We use the limma treat function to test DE relative to a fold-change threshold of 1.8. This ensure that any DE gene will have a fold-change significantly greater than 1.8:

> fit <- treat(fit, fc=1.8)
> topTreat(fit, coef=2, n=30)
          Genes NPeptides PropObs logFC AveExpr     t   P.Value adj.P.Val
P15313 ATP6V1B1        30   0.299 -5.54    15.4 -47.4 1.18e-107 9.45e-104
O95299  NDUFA10        31   0.565 -2.98    17.9 -39.6  4.13e-94  1.65e-90
Q9HBG4 ATP6V0A4        18   0.299 -4.40    15.8 -38.1  2.86e-91  7.62e-88
Q8NEY4 ATP6V1C2         9   0.298 -4.05    16.3 -37.1  2.78e-89  4.53e-86
O75489   NDUFS3        37   0.600 -2.74    17.8 -37.0  2.84e-89  4.53e-86
O15020   SPTBN2       123   0.329 -4.93    15.3 -36.9  4.99e-89  6.64e-86
P05937    CALB1        45   0.358 -7.65    15.2 -36.5  3.33e-88  3.80e-85
Q01813     PFKP       118   0.484  3.58    17.6  36.3  9.66e-88  9.63e-85
Q9NRX3 NDUFA4L2         8   0.358  4.19    16.4  35.6  2.32e-86  2.06e-83
P49821   NDUFV1        43   0.566 -2.86    17.9 -35.3  9.51e-86  7.59e-83
P00439      PAH        61   0.360 -7.23    15.3 -35.1  1.92e-85  1.39e-82
O15427  SLC16A3         8   0.479  3.66    17.1  35.0  3.78e-85  2.51e-82
Q8N8Y2 ATP6V0D2        11   0.275 -3.86    15.7 -34.9  7.27e-85  4.46e-82
Q9Y6Q5    AP1M2        11   0.341 -3.85    15.8 -34.7  1.25e-84  7.10e-82
P28331   NDUFS1        92   0.525 -2.70    17.7 -34.7  1.70e-84  9.02e-82
Q96CM8    ACSF2        62   0.426 -5.92    16.4 -34.3  1.18e-83  5.90e-81
Q9UI09  NDUFA12        20   0.602 -2.89    18.5 -34.1  2.62e-83  1.23e-80
O75306   NDUFS2        45   0.495 -2.77    17.3 -33.9  5.56e-83  2.46e-80
P51970   NDUFA8        16   0.719 -2.44    18.7 -33.8  1.12e-82  4.68e-80
Q16795   NDUFA9        40   0.539 -2.88    17.5 -33.8  1.20e-82  4.78e-80
Q02338     BDH1        21   0.429 -4.95    16.8 -33.7  1.63e-82  6.21e-80
O95182   NDUFA7        12   0.706 -2.76    18.4 -33.2  1.74e-81  6.32e-79
P19404   NDUFV2        20   0.586 -2.66    18.5 -32.3  1.57e-79  5.45e-77
A6NK44    GLOD5         7   0.331 -3.60    16.2 -32.3  1.99e-79  6.63e-77
O43181   NDUFS4        10   0.556 -2.92    17.8 -32.2  3.06e-79  9.78e-77
P12532   CKMT1A        25   0.287 -4.55    15.2 -32.1  5.83e-79  1.79e-76
Q16718   NDUFA5         9   0.841 -2.52    19.6 -31.9  1.46e-78  4.30e-76
Q03154     ACY1        94   0.553 -3.91    18.1 -31.8  2.61e-78  7.43e-76
P51687     SUOX        36   0.435 -4.02    16.6 -31.4  1.71e-77  4.71e-75
Q86XE5    HOGA1        31   0.382 -5.30    16.2 -31.4  1.90e-77  5.04e-75
> summary(dt <- decideTests(fit))
       (Intercept) TissueT
Down             0    1325
NotSig           0    6064
Up            7980     591
> plotMD(fit, coef=2, status=dt)

As expected, there are more down-regulated proteins than up-regulated. As expected, the ATP6 and NDUF families of genes, which are involved in oxidative phosphorylation, are all strongly down-regulated (compare with Figure 4BE in Clark et al, 2019). As expected, genes involved in glycolysis, such as HK3, GAPDH, ENO1 and PKM, are all up-regulated (compare with Figure 4C of Clark et al).

10 Tumor heterogeneity

We now explore which clinical characteristics contribute to inter-tumor heterogeneity. We also incorporate stromal and immune scores for each sample, which estimate how much of each sample was made up of stromal or immune cells instead of carcinoma.

Our analysis follows the multivariate analysis shown in Figure S7 of Clark et al (2019). The predictor variables used in the analysis are given in Table S5 (Tab 1) of Clark et al, and we now read that table into the R session:

> TableS5 <- read.csv("TableS5Tab1.csv", row.names=1)
> head(TableS5)
            ImmuneScoreRNA StromalScoreRNA Grade Stage MarginStatus Age Gender  BMI European Diabetes Alcohol Smoking
C3L-00004_N          -1424         -868.34     3     3            1  72      M 22.8       No       No      No     Yes
C3L-00004_T           2053          -91.54     3     3            1  72      M 22.8       No       No      No     Yes
C3L-00010_N          -1966        -1544.70     3     1            0  30      M 34.1       No       No     Yes     Yes
C3L-00010_T           1576           -2.93     3     1            0  30      M 34.1       No       No     Yes     Yes
C3L-00011_N          -1697        -1243.51     4     4            1  63      F 27.5       No       No     Yes      No
C3L-00011_T            354         -235.93     4     4            1  63      F 27.5       No       No     Yes      No

The first two variables are immune and stromal scores, as estimated by Clark et al (2019) from RNA-seq profiles of the same samples. Tumor characteristics are grade, stage, and margin status. Patient characteristics are age, gender, BMI, and yes/no variables for European origin, diabetes, alchohol drinking and smoking. Following Clark et al, we treat all the quantitive variables as numerical covariates, including grade and stage. For alcohol and smoking status, lifetime non-drinkers or non-smokers were compared to the rest of the population. For country of origin, European countries (in this case, Poland and Ukraine) were compared to all other countries.

To study tumor heterogeneity, we first subset the data to tumor samples:

> yt <- yfilt[, Tissue=="T"]
> dim(yt)
[1] 7980  103

Then we can setup a multivariate model relating protein expression to the tumor and clinical variables. First extract the variables from Table S5:

> m <- match(colnames(yt), row.names(TableS5))
> Immune   <- TableS5$ImmuneScoreRNA[m]
> Stromal  <- TableS5$StromalScoreRNA[m]
> Grade    <- TableS5$Grade[m]
> Stage    <- TableS5$Stage[m]
> Margin   <- TableS5$MarginStatus[m]
> Age      <- TableS5$Age[m]
> BMI      <- TableS5$BMI[m]
> Gender   <- factor(TableS5$Gender[m])
> European <- factor(TableS5$European[m])
> Diabetes <- factor(TableS5$Diabetes[m])
> Alchohol <- factor(TableS5$Alcohol[m])
> Smoking  <- factor(TableS5$Smoking[m])

Now we can fit linear models to the tumor samples. The design matrix is defined by:

> designt <- model.matrix(~Immune+Stromal+Grade+Stage+Margin
+         +Age+BMI+Gender+European+Diabetes+Alchohol+Smoking)

We will include all the predictors in the model at once, to illustrate the flexibility of a limpa analysis. This gives an analysis where the DE results for each clinical variable is adjusted for dependence on the stromal and immune scores, as well as for dependence on the other clinical variables.

The DE analysis uses sample weights, but there is now no need for a blocking factor, because each case appears only once:

> fitt <- dpcDE(yt, designt, sample.weights=TRUE, plot=FALSE)
First sample weights (min/max) 0.24/1.80
Final sample weights (min/max) 0.237/1.808
> fitt <- eBayes(fitt)
> summary(decideTests(fitt[,-1]))
       Immune Stromal Grade Stage Margin  Age  BMI GenderM EuropeanYes DiabetesYes AlchoholYes SmokingYes
Down      529    1328    32     0      0    1    0       4          35           0           0          0
NotSig   7042    6040  7882  7980   7978 7979 7980    7964        7937        7979        7980       7980
Up        409     612    66     0      2    0    0      12           8           1           0          0

The number of DE proteins is broadly similar to that shown in Figure S7A of Clark et al (2019). The stromal and immune scores account for nearly 2000 and 1000 proteins respectively.

It is good practice to remove predictor variables that don’t account for any DE, in order to reduce collinearity and to increase power for the remaining predictors. This does increase the number of DE genes for all the other covariates, except for Gender:

> designt <- model.matrix(~Immune+Stromal+Grade+Gender+European)
> fitt <- dpcDE(yt, designt, sample.weights=TRUE, plot=FALSE)
First sample weights (min/max) 0.218/1.810
Final sample weights (min/max) 0.217/1.810
> fitt <- eBayes(fitt)
> summary(decideTests(fitt[,-1]))
       Immune Stromal Grade GenderM EuropeanYes
Down      610    2175   172       7          37
NotSig   6924    5127  7033    7967        7931
Up        446     678   775       6          12

We see that the ImmuneScore shows positive correlation with well known immune genes:

> topTable(fitt, coef="Immune")
        Genes NPeptides PropObs    logFC AveExpr    t  P.Value adj.P.Val    B
Q96PP8   GBP5        17   0.165 0.001748    16.0 15.0 1.64e-27  1.31e-23 48.0
O15117   FYB1        27   0.280 0.000788    16.8 14.4 2.84e-26  1.13e-22 45.1
P49863   GZMK         5   0.220 0.001363    15.4 13.0 2.38e-23  6.34e-20 38.5
P07766   CD3E         2   0.195 0.001021    15.7 12.6 1.60e-22  2.55e-19 36.7
P42224  STAT1        76   0.495 0.000895    18.3 12.7 1.44e-22  2.55e-19 36.3
P13796   LCP1       108   0.518 0.000580    18.5 12.2 1.29e-21  1.71e-18 34.1
Q9UI08    EVL        32   0.473 0.000513    17.4 12.1 2.05e-21  2.34e-18 33.7
P08575  PTPRC        42   0.350 0.000718    17.4 11.7 1.29e-20  1.29e-17 31.9
P31146 CORO1A        52   0.473 0.000556    18.0 11.6 2.36e-20  2.09e-17 31.2
P28838   LAP3        82   0.600 0.000656    18.7 11.5 4.17e-20  3.33e-17 30.6

The StromalScore shows positive correlation with well known stromal markers such as RCN3 and COL14A1:

> topTable(fitt, coef="Stromal")
         Genes NPeptides PropObs    logFC AveExpr    t  P.Value adj.P.Val    B
Q96D15    RCN3        14   0.365 0.001519    16.4 13.1 1.44e-23  1.15e-19 39.4
Q05707 COL14A1       160   0.534 0.002128    17.4 12.3 8.66e-22  3.46e-18 35.0
Q7Z5L7    PODN        18   0.306 0.002486    16.0 11.9 4.72e-21  1.26e-17 33.5
P67936    TPM4        30   0.758 0.000844    18.8 11.7 1.42e-20  2.82e-17 32.0
Q06828    FMOD        18   0.276 0.002557    16.6 10.9 8.24e-19  1.11e-15 28.3
Q6UWY5  OLFML1        24   0.445 0.002094    15.9 10.9 9.80e-19  1.12e-15 28.2
P51884     LUM        34   0.754 0.001792    20.7 10.9 8.31e-19  1.11e-15 27.9
Q16647   PTGIS        23   0.130 0.002347    14.7 10.7 2.00e-18  1.99e-15 27.5
P35442   THBS2        75   0.201 0.003008    15.6 10.7 3.11e-18  2.76e-15 27.0
Q8IUX7   AEBP1        68   0.441 0.002058    17.2 10.2 3.29e-17  2.63e-14 24.5

Grade is correlated with genes such as STEAP3, known to be associated with high-grade cancer and poor survival (Zhang et al, 2024).

> topTable(fitt, coef="Grade")
          Genes NPeptides PropObs  logFC AveExpr     t  P.Value adj.P.Val     B
Q658P3   STEAP3         8   0.166  1.276    15.5  6.59 2.04e-09  1.25e-05 10.99
Q10471   GALNT2        22   0.468  0.702    17.0  6.49 3.18e-09  1.25e-05 10.77
Q9Y617    PSAT1        44   0.450  1.307    14.8  6.41 4.71e-09  1.25e-05 10.18
P05121 SERPINE1        27   0.116  1.403    14.9  5.96 3.66e-08  7.29e-05  8.34
P51161    FABP6         6   0.150  1.421    15.8  5.91 4.57e-08  7.30e-05  8.16
Q9UHR4 BAIAP2L1         5   0.291  0.747    15.4  5.87 5.65e-08  7.52e-05  7.85
P60602    ROMO1         2   0.612  0.593    17.9  5.81 7.20e-08  8.21e-05  7.82
Q9H5V8    CDCP1         5   0.133  0.947    15.7  5.75 9.59e-08  9.57e-05  7.43
O60443    GSDME        27   0.510  0.620    17.3  5.70 1.19e-07  1.06e-04  7.39
Q15599 SLC9A3R2        24   0.653 -0.539    17.2 -5.62 1.70e-07  1.15e-04  7.05

Gender genes show up-regulation of Y chromosome genes like DDX3Y AND RPS4Y1 and down-regulation of female-biased genes like PZP (pregnancy zone protein).

> topTable(fitt, coef="GenderM")
         Genes NPeptides PropObs  logFC AveExpr     t  P.Value adj.P.Val     B
O15523   DDX3Y         6   0.352  2.140    15.9 11.55 3.55e-20  2.83e-16 31.02
P22090  RPS4Y1         7   0.475  2.747    16.9 11.12 3.11e-19  1.24e-15 29.39
P20742     PZP        39   0.175 -2.747    15.1 -7.98 2.30e-12  6.13e-09 16.90
Q96A49   SYAP1        19   0.494 -0.573    16.7 -5.62 1.67e-07  3.34e-04  6.92
O14602  EIF1AY         1   0.727  1.147    19.0  4.73 7.26e-06  9.65e-03  3.50
Q96BH1   RNF25         1   0.251 -0.718    16.4 -4.76 6.56e-06  9.65e-03  3.40
Q96A58    RERG         3   0.742  0.598    18.3  4.57 1.37e-05  1.56e-02  2.97
P14209    CD99        11   0.529  0.430    18.1  4.41 2.59e-05  2.58e-02  2.40
O00584 RNASET2        22   0.619  1.007    18.1  4.34 3.44e-05  3.05e-02  2.14
Q13501  SQSTM1        53   0.237 -1.120    15.4 -4.21 5.47e-05  4.06e-02  1.68

The European-associated genes are harder to interpret, but do seem to reproduce results from the original study:

> topTable(fitt, coef="EuropeanYes")
         Genes NPeptides PropObs  logFC AveExpr     t  P.Value adj.P.Val     B
P02144      MB         4   0.365  1.408    15.5  8.18 8.70e-13  6.94e-09 16.81
Q9H361  PABPC3         6   0.125 -1.347    15.8 -5.75 9.38e-08  3.74e-04  6.99
Q9BX59  TAPBPL         9   0.231 -1.170    15.9 -5.41 4.26e-07  1.13e-03  5.78
Q9UEU0   VTI1B        11   0.709 -0.397    17.4 -5.06 1.92e-06  3.83e-03  4.75
Q8TBY9 CFAP251         1   0.588 -1.219    22.4 -5.00 2.45e-06  3.92e-03  4.55
P27105    STOM        20   0.691 -0.692    18.9 -4.93 3.29e-06  4.37e-03  4.29
Q5XKK7 FAM219B         1   0.840 -0.600    17.1 -4.75 6.67e-06  7.60e-03  3.43
P06702  S100A9        26   0.370 -1.153    16.9 -4.60 1.22e-05  1.18e-02  3.00
Q08722    CD47         5   0.814 -0.494    18.9 -4.57 1.38e-05  1.18e-02  2.97
Q92626    PXDN        84   0.383  0.832    17.1  4.53 1.63e-05  1.18e-02  2.82

11 Further reading

More documentation and example analyses are available from https://github.com/SmythLab/limpa/.

12 Cited references

Clark DJ, Dhanasekaran SM, Petralia F, Pan J, Song X, Hu Y, da Veiga Leprevost F, Reva B, Lih TM, Chang HY, Ma W, Huang C, Ricketts CJ, Chen L, Krek A, Li Y, Rykunov D, Li QK, Chen LS, Ozbek U, Vasaikar S, Wu Y, Yoo S, Chowdhury S, Wyczalkowski MA, Ji J, Schnaubelt M, Kong A, Sethuraman S, Avtonomov DM, Ao M, Colaprico A, Cao S, Cho KC, Kalayci S, Ma S, Liu W, Ruggles K, Calinawan A, Gümüs ZH, Geiszler D, Kawaler E, Teo GC, Wen B, Zhang Y, Keegan S, Li K, Chen F, Edwards N, Pierorazio PM, Chen XS, Pavlovich CP, Hakimi AA, Brominski G, Hsieh JJ, Antczak A, Omelchenko T, Lubinski J, Wiznerowicz M, Linehan WM, Kinsinger CR, Thiagarajan M, Boja ES, Mesri M, Hiltke T, Robles AI, Rodriguez H, Qian J, Fenyö D, Zhang B, Ding L, Schadt E, Chinnaiyan AM, Zhang Z, Omenn GS, Cieslik M, Chan DW, Nesvizhskii AI, Wang P, Zhang H, Clinical Proteomic Tumor Analysis Consortium (2019). Integrated proteogenomic characterization of clear cell renal cell carcinoma. Cell 179(4), 964-983. doi:10.1016/j.cell.2019.12.026

Kohler D, Staniak M, Yu F, Nesvizhskii AI, Vitek O (2024). An MSstats workflow for detecting differentially abundant proteins in large-scale data-independent acquisition mass spectrometry experiments with FragPipe processing. Nature Protocols 19(10), 2915-2938. doi:10.1038/s41596-024-01000-3

Yu F, Teo GC, Kong AT, Fröhlich K, Li GX, Demichev V, Nesvizhskii AI (2023). Analysis of DIA proteomics data using MSFragger-DIA and FragPipe computational platform. Nature Communications 14(1), 4154. doi:10.1038/s41467-023-39869-5

Zhang W, Xie M, Huang Q, Liu H, Liu J, Sun X, Li C (2024). STEAP3 is a potential preliminary prognostic biomarker of glioblastoma. Scientific Reports 14(1), 30994. doi:10.1038/s41598-024-82145-9