Renal cell carcinoma (kidney cancer) is one of the 10 most common diagnosed cancers worldwide. Clear cell renal cell carcinoma (ccRCC) is the most prevalent histological subtype, representing 75% of all case and the majority of deaths. The Clinical Proteomics Tumor Analysis Consortium (CPTAC) performed proteome analysis of 103 treatment naive ccRCC tumors and 84 paired normal adjacent tissues (NATs) (Clark et al 2019).
The DIA whole proteome data from Clark et al (2019) has been deposited to the MassIVE proteome database as dataset MSV000093041
(https://doi.org/doi:10.25345/C52V2CM61).
The data was reanalysed by Kohler et al (2024) and, in particular, the raw data was reprocessed using FragPipe and MSFragger-DIA, with extraction of quantification using DIA-NN (Yu et al 2023).
The files from the reanalysis are available as MassIVE Reanalysis RMSV000000696.1.
The quantification files, including the DIA-NN report.tsv file, are available via ftp from
ftp://massive-ftp.ucsd.edu/x01/RMSV000000696/2023-12-15_devonjkohler_b88e6fe3/quant.
The data includes samples from 106 individuals (or cases). There are 81 cases with matched tumor and normal adjacent tissue samples, and also 22 cases with tumor only and 3 cases with normal tissue only.
Here, we analyse the data using the limpa package. This vignette was made with limpa version 1.3.4 and limma version 3.67.0.
The first step is the read the DIA-NN report file containing the precursor quantifications into a limma EList object:
> library(limpa)
> x <- readDIANN("report.tsv", q.columns=c(),
+ extra.columns=c("Protein.Group","Genes"))
The data includes 156,397 precursors and 187 samples:
> dim(x)
[1] 156397 187
The data includes UniProt protein IDs and gene names:
> head(x$genes)
Protein.Group Genes
(UniMod:1)AAAAAAAAAAGAAGGR1 Q86U42 PABPN1
(UniMod:1)AAAAAAAAAAGAAGGR2 Q86U42 PABPN1
(UniMod:1)AAAAAAAAAAGAAGGRGSGPGR2 Q86U42 PABPN1
(UniMod:1)AAAAAAAAEQQSSNGPVK2 Q16585 SGCB
(UniMod:1)AAAAAAAAEQQSSNGPVK3 Q16585 SGCB
(UniMod:1)AAAAAAAGDSDSWDADAFSVEDPVRK2 O75822 EIF3J
There are 9019 proteins in total:
> length(unique(x$genes$Protein.Group))
[1] 9019
We can extract the case IDs and tissue origin from the sample names.
Note that row.names(x) gives the precursor IDs and colnames(x) gives the sample names.
> cn.s <- strsplit2(colnames(x), split="_")
> Case <- factor(cn.s[,7])
> Tissue <- factor(substring(cn.s[,8],1,1))
There are 103 tumor samples and 84 normal adjacent tissue samples:
> table(Tissue)
Tissue
N T
84 103
A closer look using table(Tissue,Case) shows that there are 81 cases with matched tumor and normal adjacent tissue samples from the same case, while 22 cases have tumor only and 3 cases have normal tissue only.
For convenient and brevity, we reset the sample names to consist just of the case and tissue:
> colnames(x) <- paste(Case,Tissue,sep="_")
Over 48% of the quantifications are NA:
> mean(is.na(x$E))
[1] 0.482
The rate of missing values is much higher in the tumor samples:
> plot(colMeans(is.na(x$E)) ~ Tissue)
but the average log-intensity of peptides that are detected is slightly higher in the tumor samples
> plot(colMeans(x$E, na.rm=TRUE) ~ Tissue)
This suggests that about 5-10% of proteins that are expressed in normal tissue may be down-regulated or absent in the tumor samples. This sort of global imbalance in expression levels poses questions for how the log-intensities should be normalized in downstream analyses. If we compare tumor to normal tissue without normalization, we should anticipate unbalanced DE with far more down-regulatd than up-regulated proteins. The difference in missing value rate between tumor and normal samples may reflect a true biological difference between the tissues, and so should not necessarily be corrected for by normalization.
For DIA-NN data, it would be reasonable to assume a DPC slope of about 0.7, but we will here estimate the DPC from the data itself:
> dpcest <- dpcCN(x)
Iter 1: dpc = -13.893 0.806
Iter 2: dpc = -14.897 0.871
> plotDPC(dpcest)
The DPC slope is quite steep for this data, corresponding to a steep transition from nearly all missing at around log2-intensity 15 to nearly all detected around 20.
Next we estimate the protein intensities from the precursor data. For completeness, we will retain all precursors in the quantification step, even though the percentage of missing values is very high for some of the precursors. We also retain all the proteins, including those with just one precursor.
One could consider filtering out precursors or proteins that are almost all NA in order to speed up the quantification step. Ideally though, we choose to retain all the precursors, because even precursors with just one or two observed values can potentially add some information to the final protein values and standard errors. The limpa quantification step accumulates information from multiple precursors rather than averaging, so including some low-information precursors will not detract from the accuracy of the final estimates.
> y <- dpcQuant(x, dpc=dpcest)
The protein-level EList has 9019 rows, one for each protein:
> dim(y)
[1] 9019 187
At this stage we will filter out some proteins not of interest in the downstream DE analysis.
We remove the Biognosys protein that was spiked-in for mass spectrometry calibration:
> Biognosys <- grep("Biognosys|iRT-Kit_WR_fusion", row.names(y))
We also remove the immunoglobin heavy chain variable regions, because they will be sample-specific rather than tissue-specific:
> IGHV <- grep("^IGHV", y$genes$Genes)
> IGKV <- grep("^IGKV", y$genes$Genes)
> IGLV <- grep("^IGLV", y$genes$Genes)
> yfilt <- y
> yfilt <- yfilt[-c(Biognosys,IGHV,IGKV,IGLV),]
> dim(yfilt)
[1] 8921 187
We will also filter out proteins that were detected in fewer than 40 samples:
> NSamplesDetectedIn <- rowSums(yfilt$other$n.observations > 0)
> yfilt <- yfilt[NSamplesDetectedIn >= 40,]
> dim(yfilt)
[1] 7980 187
This leaves 7980 proteins for downstream analysis.
It is difficult to decide whether to perform additional between-sample normalization at this stage. We will refrain from any extra normalization, for the reasons given above.
An MDS plot shows that the normal and tumor tissue samples are very well separated, with one exceptional case:
> Color <- Tissue
> levels(Color) <- c("blue","red")
> mds <- plotMDSUsingSEs(yfilt, pch=16, col=as.character(Color),
+ main="MDS plot of sample relationships")
> legend("bottomleft", legend=c("Normal","Tumor"), pch=16, col=levels(Color))
Looking closely at the MDS plot, we can see one tumor sample on the left amongst the normal samples and one normal sample on the right amongst the tumor samples.
We can identify these samples using the x-axis from the MDS plot:
> colnames(yfilt)[mds$x < -3 & Tissue=="T"]
[1] "C3L-00360.T"
> colnames(yfilt)[mds$x > 3 & Tissue=="N"]
[1] "C3L-00360.N"
We see that the outlier samples are both from case C3L-00360, and it would seem that the tissue labels for this case have been swapped. This swap is presumably a processing error, because no such outlier were observed in Figure 4A of Clark et al (2019), which was made from TMT-based proteomic data for the same samples. We correct this apparent error by reversing the tumor and tissue labels for case C3L-00360:
> Tissue[mds$x < -3 & Tissue=="T"] <- "N"
> Tissue[mds$x > 3 & Tissue=="N"] <- "T"
> colnames(yfilt) <- paste(Case,Tissue,sep="_")
We now evaluate differentially expressed proteins in the tumor samples compared to the matched normals.
Because of the unmatched samples, we treat the cases as random effects and estimate the intra-case correlation by setting the block argument of dpcDE to Case.
We also allow for sample-specific quality weights by setting sample.weights=TRUE:
> design <- model.matrix(~Tissue)
> fit <- dpcDE(yfilt, design, block=Case, sample.weights=TRUE, plot=TRUE)
First sample weights (min/max) 0.242/2.413
First intra-block correlation 0.178
Final sample weights (min/max) 0.248/2.412
Final intra-block correlation 0.179
The vooma plot shows that limpa is able to predict a variance trend using the quantification standard errors. As expected, samples from the same case are positively correlated. Tumor samples are generally given lower sample weights than the normal tissue samples, reflecting the greater heterogeneity of the tumor samples:
> plot(fit$targets$sample.weight ~ Tissue)
We go on to evaluate differential expression.
We use the limma treat function to test DE relative to a fold-change threshold of 1.8.
This ensure that any DE gene will have a fold-change significantly greater than 1.8:
> fit <- treat(fit, fc=1.8)
> topTreat(fit, coef=2, n=30)
Genes NPeptides PropObs logFC AveExpr t P.Value adj.P.Val
P15313 ATP6V1B1 30 0.299 -5.54 15.4 -47.4 1.18e-107 9.45e-104
O95299 NDUFA10 31 0.565 -2.98 17.9 -39.6 4.13e-94 1.65e-90
Q9HBG4 ATP6V0A4 18 0.299 -4.40 15.8 -38.1 2.86e-91 7.62e-88
Q8NEY4 ATP6V1C2 9 0.298 -4.05 16.3 -37.1 2.78e-89 4.53e-86
O75489 NDUFS3 37 0.600 -2.74 17.8 -37.0 2.84e-89 4.53e-86
O15020 SPTBN2 123 0.329 -4.93 15.3 -36.9 4.99e-89 6.64e-86
P05937 CALB1 45 0.358 -7.65 15.2 -36.5 3.33e-88 3.80e-85
Q01813 PFKP 118 0.484 3.58 17.6 36.3 9.66e-88 9.63e-85
Q9NRX3 NDUFA4L2 8 0.358 4.19 16.4 35.6 2.32e-86 2.06e-83
P49821 NDUFV1 43 0.566 -2.86 17.9 -35.3 9.51e-86 7.59e-83
P00439 PAH 61 0.360 -7.23 15.3 -35.1 1.92e-85 1.39e-82
O15427 SLC16A3 8 0.479 3.66 17.1 35.0 3.78e-85 2.51e-82
Q8N8Y2 ATP6V0D2 11 0.275 -3.86 15.7 -34.9 7.27e-85 4.46e-82
Q9Y6Q5 AP1M2 11 0.341 -3.85 15.8 -34.7 1.25e-84 7.10e-82
P28331 NDUFS1 92 0.525 -2.70 17.7 -34.7 1.70e-84 9.02e-82
Q96CM8 ACSF2 62 0.426 -5.92 16.4 -34.3 1.18e-83 5.90e-81
Q9UI09 NDUFA12 20 0.602 -2.89 18.5 -34.1 2.62e-83 1.23e-80
O75306 NDUFS2 45 0.495 -2.77 17.3 -33.9 5.56e-83 2.46e-80
P51970 NDUFA8 16 0.719 -2.44 18.7 -33.8 1.12e-82 4.68e-80
Q16795 NDUFA9 40 0.539 -2.88 17.5 -33.8 1.20e-82 4.78e-80
Q02338 BDH1 21 0.429 -4.95 16.8 -33.7 1.63e-82 6.21e-80
O95182 NDUFA7 12 0.706 -2.76 18.4 -33.2 1.74e-81 6.32e-79
P19404 NDUFV2 20 0.586 -2.66 18.5 -32.3 1.57e-79 5.45e-77
A6NK44 GLOD5 7 0.331 -3.60 16.2 -32.3 1.99e-79 6.63e-77
O43181 NDUFS4 10 0.556 -2.92 17.8 -32.2 3.06e-79 9.78e-77
P12532 CKMT1A 25 0.287 -4.55 15.2 -32.1 5.83e-79 1.79e-76
Q16718 NDUFA5 9 0.841 -2.52 19.6 -31.9 1.46e-78 4.30e-76
Q03154 ACY1 94 0.553 -3.91 18.1 -31.8 2.61e-78 7.43e-76
P51687 SUOX 36 0.435 -4.02 16.6 -31.4 1.71e-77 4.71e-75
Q86XE5 HOGA1 31 0.382 -5.30 16.2 -31.4 1.90e-77 5.04e-75
> summary(dt <- decideTests(fit))
(Intercept) TissueT
Down 0 1325
NotSig 0 6064
Up 7980 591
> plotMD(fit, coef=2, status=dt)
As expected, there are more down-regulated proteins than up-regulated.
As expected, the ATP6 and NDUF families of genes, which are involved in oxidative phosphorylation, are all strongly down-regulated (compare with Figure 4BE in Clark et al, 2019).
As expected, genes involved in glycolysis, such as HK3, GAPDH, ENO1 and PKM, are all up-regulated (compare with Figure 4C of Clark et al).
We now explore which clinical characteristics contribute to inter-tumor heterogeneity. We also incorporate stromal and immune scores for each sample, which estimate how much of each sample was made up of stromal or immune cells instead of carcinoma.
Our analysis follows the multivariate analysis shown in Figure S7 of Clark et al (2019). The predictor variables used in the analysis are given in Table S5 (Tab 1) of Clark et al, and we now read that table into the R session:
> TableS5 <- read.csv("TableS5Tab1.csv", row.names=1)
> head(TableS5)
ImmuneScoreRNA StromalScoreRNA Grade Stage MarginStatus Age Gender BMI European Diabetes Alcohol Smoking
C3L-00004_N -1424 -868.34 3 3 1 72 M 22.8 No No No Yes
C3L-00004_T 2053 -91.54 3 3 1 72 M 22.8 No No No Yes
C3L-00010_N -1966 -1544.70 3 1 0 30 M 34.1 No No Yes Yes
C3L-00010_T 1576 -2.93 3 1 0 30 M 34.1 No No Yes Yes
C3L-00011_N -1697 -1243.51 4 4 1 63 F 27.5 No No Yes No
C3L-00011_T 354 -235.93 4 4 1 63 F 27.5 No No Yes No
The first two variables are immune and stromal scores, as estimated by Clark et al (2019) from RNA-seq profiles of the same samples. Tumor characteristics are grade, stage, and margin status. Patient characteristics are age, gender, BMI, and yes/no variables for European origin, diabetes, alchohol drinking and smoking. Following Clark et al, we treat all the quantitive variables as numerical covariates, including grade and stage. For alcohol and smoking status, lifetime non-drinkers or non-smokers were compared to the rest of the population. For country of origin, European countries (in this case, Poland and Ukraine) were compared to all other countries.
To study tumor heterogeneity, we first subset the data to tumor samples:
> yt <- yfilt[, Tissue=="T"]
> dim(yt)
[1] 7980 103
Then we can setup a multivariate model relating protein expression to the tumor and clinical variables. First extract the variables from Table S5:
> m <- match(colnames(yt), row.names(TableS5))
> Immune <- TableS5$ImmuneScoreRNA[m]
> Stromal <- TableS5$StromalScoreRNA[m]
> Grade <- TableS5$Grade[m]
> Stage <- TableS5$Stage[m]
> Margin <- TableS5$MarginStatus[m]
> Age <- TableS5$Age[m]
> BMI <- TableS5$BMI[m]
> Gender <- factor(TableS5$Gender[m])
> European <- factor(TableS5$European[m])
> Diabetes <- factor(TableS5$Diabetes[m])
> Alchohol <- factor(TableS5$Alcohol[m])
> Smoking <- factor(TableS5$Smoking[m])
Now we can fit linear models to the tumor samples. The design matrix is defined by:
> designt <- model.matrix(~Immune+Stromal+Grade+Stage+Margin
+ +Age+BMI+Gender+European+Diabetes+Alchohol+Smoking)
We will include all the predictors in the model at once, to illustrate the flexibility of a limpa analysis. This gives an analysis where the DE results for each clinical variable is adjusted for dependence on the stromal and immune scores, as well as for dependence on the other clinical variables.
The DE analysis uses sample weights, but there is now no need for a blocking factor, because each case appears only once:
> fitt <- dpcDE(yt, designt, sample.weights=TRUE, plot=FALSE)
First sample weights (min/max) 0.24/1.80
Final sample weights (min/max) 0.237/1.808
> fitt <- eBayes(fitt)
> summary(decideTests(fitt[,-1]))
Immune Stromal Grade Stage Margin Age BMI GenderM EuropeanYes DiabetesYes AlchoholYes SmokingYes
Down 529 1328 32 0 0 1 0 4 35 0 0 0
NotSig 7042 6040 7882 7980 7978 7979 7980 7964 7937 7979 7980 7980
Up 409 612 66 0 2 0 0 12 8 1 0 0
The number of DE proteins is broadly similar to that shown in Figure S7A of Clark et al (2019). The stromal and immune scores account for nearly 2000 and 1000 proteins respectively.
It is good practice to remove predictor variables that don’t account for any DE, in order to reduce collinearity and to increase power for the remaining predictors. This does increase the number of DE genes for all the other covariates, except for Gender:
> designt <- model.matrix(~Immune+Stromal+Grade+Gender+European)
> fitt <- dpcDE(yt, designt, sample.weights=TRUE, plot=FALSE)
First sample weights (min/max) 0.218/1.810
Final sample weights (min/max) 0.217/1.810
> fitt <- eBayes(fitt)
> summary(decideTests(fitt[,-1]))
Immune Stromal Grade GenderM EuropeanYes
Down 610 2175 172 7 37
NotSig 6924 5127 7033 7967 7931
Up 446 678 775 6 12
We see that the ImmuneScore shows positive correlation with well known immune genes:
> topTable(fitt, coef="Immune")
Genes NPeptides PropObs logFC AveExpr t P.Value adj.P.Val B
Q96PP8 GBP5 17 0.165 0.001748 16.0 15.0 1.64e-27 1.31e-23 48.0
O15117 FYB1 27 0.280 0.000788 16.8 14.4 2.84e-26 1.13e-22 45.1
P49863 GZMK 5 0.220 0.001363 15.4 13.0 2.38e-23 6.34e-20 38.5
P07766 CD3E 2 0.195 0.001021 15.7 12.6 1.60e-22 2.55e-19 36.7
P42224 STAT1 76 0.495 0.000895 18.3 12.7 1.44e-22 2.55e-19 36.3
P13796 LCP1 108 0.518 0.000580 18.5 12.2 1.29e-21 1.71e-18 34.1
Q9UI08 EVL 32 0.473 0.000513 17.4 12.1 2.05e-21 2.34e-18 33.7
P08575 PTPRC 42 0.350 0.000718 17.4 11.7 1.29e-20 1.29e-17 31.9
P31146 CORO1A 52 0.473 0.000556 18.0 11.6 2.36e-20 2.09e-17 31.2
P28838 LAP3 82 0.600 0.000656 18.7 11.5 4.17e-20 3.33e-17 30.6
The StromalScore shows positive correlation with well known stromal markers such as RCN3 and COL14A1:
> topTable(fitt, coef="Stromal")
Genes NPeptides PropObs logFC AveExpr t P.Value adj.P.Val B
Q96D15 RCN3 14 0.365 0.001519 16.4 13.1 1.44e-23 1.15e-19 39.4
Q05707 COL14A1 160 0.534 0.002128 17.4 12.3 8.66e-22 3.46e-18 35.0
Q7Z5L7 PODN 18 0.306 0.002486 16.0 11.9 4.72e-21 1.26e-17 33.5
P67936 TPM4 30 0.758 0.000844 18.8 11.7 1.42e-20 2.82e-17 32.0
Q06828 FMOD 18 0.276 0.002557 16.6 10.9 8.24e-19 1.11e-15 28.3
Q6UWY5 OLFML1 24 0.445 0.002094 15.9 10.9 9.80e-19 1.12e-15 28.2
P51884 LUM 34 0.754 0.001792 20.7 10.9 8.31e-19 1.11e-15 27.9
Q16647 PTGIS 23 0.130 0.002347 14.7 10.7 2.00e-18 1.99e-15 27.5
P35442 THBS2 75 0.201 0.003008 15.6 10.7 3.11e-18 2.76e-15 27.0
Q8IUX7 AEBP1 68 0.441 0.002058 17.2 10.2 3.29e-17 2.63e-14 24.5
Grade is correlated with genes such as STEAP3, known to be associated with high-grade cancer and poor survival (Zhang et al, 2024).
> topTable(fitt, coef="Grade")
Genes NPeptides PropObs logFC AveExpr t P.Value adj.P.Val B
Q658P3 STEAP3 8 0.166 1.276 15.5 6.59 2.04e-09 1.25e-05 10.99
Q10471 GALNT2 22 0.468 0.702 17.0 6.49 3.18e-09 1.25e-05 10.77
Q9Y617 PSAT1 44 0.450 1.307 14.8 6.41 4.71e-09 1.25e-05 10.18
P05121 SERPINE1 27 0.116 1.403 14.9 5.96 3.66e-08 7.29e-05 8.34
P51161 FABP6 6 0.150 1.421 15.8 5.91 4.57e-08 7.30e-05 8.16
Q9UHR4 BAIAP2L1 5 0.291 0.747 15.4 5.87 5.65e-08 7.52e-05 7.85
P60602 ROMO1 2 0.612 0.593 17.9 5.81 7.20e-08 8.21e-05 7.82
Q9H5V8 CDCP1 5 0.133 0.947 15.7 5.75 9.59e-08 9.57e-05 7.43
O60443 GSDME 27 0.510 0.620 17.3 5.70 1.19e-07 1.06e-04 7.39
Q15599 SLC9A3R2 24 0.653 -0.539 17.2 -5.62 1.70e-07 1.15e-04 7.05
Gender genes show up-regulation of Y chromosome genes like DDX3Y AND RPS4Y1 and down-regulation of female-biased genes like PZP (pregnancy zone protein).
> topTable(fitt, coef="GenderM")
Genes NPeptides PropObs logFC AveExpr t P.Value adj.P.Val B
O15523 DDX3Y 6 0.352 2.140 15.9 11.55 3.55e-20 2.83e-16 31.02
P22090 RPS4Y1 7 0.475 2.747 16.9 11.12 3.11e-19 1.24e-15 29.39
P20742 PZP 39 0.175 -2.747 15.1 -7.98 2.30e-12 6.13e-09 16.90
Q96A49 SYAP1 19 0.494 -0.573 16.7 -5.62 1.67e-07 3.34e-04 6.92
O14602 EIF1AY 1 0.727 1.147 19.0 4.73 7.26e-06 9.65e-03 3.50
Q96BH1 RNF25 1 0.251 -0.718 16.4 -4.76 6.56e-06 9.65e-03 3.40
Q96A58 RERG 3 0.742 0.598 18.3 4.57 1.37e-05 1.56e-02 2.97
P14209 CD99 11 0.529 0.430 18.1 4.41 2.59e-05 2.58e-02 2.40
O00584 RNASET2 22 0.619 1.007 18.1 4.34 3.44e-05 3.05e-02 2.14
Q13501 SQSTM1 53 0.237 -1.120 15.4 -4.21 5.47e-05 4.06e-02 1.68
The European-associated genes are harder to interpret, but do seem to reproduce results from the original study:
> topTable(fitt, coef="EuropeanYes")
Genes NPeptides PropObs logFC AveExpr t P.Value adj.P.Val B
P02144 MB 4 0.365 1.408 15.5 8.18 8.70e-13 6.94e-09 16.81
Q9H361 PABPC3 6 0.125 -1.347 15.8 -5.75 9.38e-08 3.74e-04 6.99
Q9BX59 TAPBPL 9 0.231 -1.170 15.9 -5.41 4.26e-07 1.13e-03 5.78
Q9UEU0 VTI1B 11 0.709 -0.397 17.4 -5.06 1.92e-06 3.83e-03 4.75
Q8TBY9 CFAP251 1 0.588 -1.219 22.4 -5.00 2.45e-06 3.92e-03 4.55
P27105 STOM 20 0.691 -0.692 18.9 -4.93 3.29e-06 4.37e-03 4.29
Q5XKK7 FAM219B 1 0.840 -0.600 17.1 -4.75 6.67e-06 7.60e-03 3.43
P06702 S100A9 26 0.370 -1.153 16.9 -4.60 1.22e-05 1.18e-02 3.00
Q08722 CD47 5 0.814 -0.494 18.9 -4.57 1.38e-05 1.18e-02 2.97
Q92626 PXDN 84 0.383 0.832 17.1 4.53 1.63e-05 1.18e-02 2.82
More documentation and example analyses are available from https://github.com/SmythLab/limpa/.
Clark DJ, Dhanasekaran SM, Petralia F, Pan J, Song X, Hu Y, da Veiga Leprevost F, Reva B, Lih TM, Chang HY, Ma W, Huang C, Ricketts CJ, Chen L, Krek A, Li Y, Rykunov D, Li QK, Chen LS, Ozbek U, Vasaikar S, Wu Y, Yoo S, Chowdhury S, Wyczalkowski MA, Ji J, Schnaubelt M, Kong A, Sethuraman S, Avtonomov DM, Ao M, Colaprico A, Cao S, Cho KC, Kalayci S, Ma S, Liu W, Ruggles K, Calinawan A, Gümüs ZH, Geiszler D, Kawaler E, Teo GC, Wen B, Zhang Y, Keegan S, Li K, Chen F, Edwards N, Pierorazio PM, Chen XS, Pavlovich CP, Hakimi AA, Brominski G, Hsieh JJ, Antczak A, Omelchenko T, Lubinski J, Wiznerowicz M, Linehan WM, Kinsinger CR, Thiagarajan M, Boja ES, Mesri M, Hiltke T, Robles AI, Rodriguez H, Qian J, Fenyö D, Zhang B, Ding L, Schadt E, Chinnaiyan AM, Zhang Z, Omenn GS, Cieslik M, Chan DW, Nesvizhskii AI, Wang P, Zhang H, Clinical Proteomic Tumor Analysis Consortium (2019). Integrated proteogenomic characterization of clear cell renal cell carcinoma. Cell 179(4), 964-983. doi:10.1016/j.cell.2019.12.026
Kohler D, Staniak M, Yu F, Nesvizhskii AI, Vitek O (2024). An MSstats workflow for detecting differentially abundant proteins in large-scale data-independent acquisition mass spectrometry experiments with FragPipe processing. Nature Protocols 19(10), 2915-2938. doi:10.1038/s41596-024-01000-3
Yu F, Teo GC, Kong AT, Fröhlich K, Li GX, Demichev V, Nesvizhskii AI (2023). Analysis of DIA proteomics data using MSFragger-DIA and FragPipe computational platform. Nature Communications 14(1), 4154. doi:10.1038/s41467-023-39869-5
Zhang W, Xie M, Huang Q, Liu H, Liu J, Sun X, Li C (2024). STEAP3 is a potential preliminary prognostic biomarker of glioblastoma. Scientific Reports 14(1), 30994. doi:10.1038/s41598-024-82145-9