2.7. Data analysis
LFQ data were normalized by
total
ion current (TIC) and filtered for 50% valid intensity values across
all samples. Missing values were replaced by 1/5 of the minimum positive
value of each variable by
MetaboAnalyst 5.0 [21].
Quantified proteins with fold change > 2 or < 0.5
and P value < 0.05 were considered as DEPs. In the figures,
experimental data are shown as standard error of mean.
Metascape [22] was utilized for functional enrichment and
protein-protein interaction networks analysis.
P
values for the functional enrichments were calculated by a
hypergeometric test and corrected by the Benjamini-Hochberg FDR method.
Cytoscape [23] software was used for reorganizing and visualizing
the interaction networks. The proportional Venn diagrams and the Sankey
diagram were analyzed using a Bioinformatics online
tool.
The artwork was created with BioRender.com. MetaboAnalyst 5.0 [21]
was used for the statistical analysis
and
biomarker
discovery
of DEPs, including unsupervised clustering, PCA, Pearson correlation
analysis, and machine learning.
For machine learning, ROC curves were generated using MetaboAnalyst 5.0.
Multivariate ROC curves were generated by Monte Carlo cross-validation
(MCCV) using balanced sub-sampling. In each MCCV, two-thirds of the
samples were used to evaluate the feature importance. The top 2, 3, 5,
10 …100 (max) important features were then used to build
classification models, which were validated using one-third of the
remaining samples. The procedure was repeated multiple times to
calculate the performance and confidence interval of each model. PLS-DA
was used as the classification method, and the PLS-DA built-in was
selected as the feature-ranking method with two latent variables.
Feature selection was based on the ROC curve results, and the top 5, 10,
15, 25, 50, and 100 proteins were used for predictive accuracy
assessment.