Serum proteomics profiling identifies a preliminary signature for the diagnosis of early‐stage lung cancer

Lung cancer is the most common cause of death from cancer worldwide, largely due to late diagnosis. Thus, there is an urgent need to develop new approaches to improve the detection of early‐stage lung cancer, which would greatly improve patient survival.

future validation studies for the development of a non-invasive diagnostic tool for lung cancer.

K E Y W O R D S
biomarker, early diagnosis, lung cancer, mass spectrometry, machine learning, serum INTRODUCTION Lung cancer is the most common cause of death from cancer worldwide (WHO, 2019). Although the 5-year survival rate is 54% for cases diagnosed when the disease is still localized, only 15% of lung cancers are identified at early stages (American Cancer Society, 2018). Low-dose computed tomography has been shown to provide a 20% reduction in lung cancer mortality, and has been proposed as an annual screening to test the high-risk population. However, the high false-positive rate, costs and risks associated with radiation limit its use in the clinical context [1]. Thus, there is the urgent need to develop new approaches for the detection of early-stage lung cancer, which would greatly improve patient survival.
The last decade has witnessed an increasing interest in the design of new technologies for the identification of biomarkers suitable to screen the asymptomatic at-risk population, which should be reproducible, non-invasive and cost-effective. In this scenario, body fluids, such as blood (plasma or serum), exhaled breath and urine represent ideal clinical samples to be analyzed, because of their minimally invasive accessibility and availability. Proteins represent particularly interesting biomarkers, because they are relatively stable and are the biological endpoint responsible for most cellular functions. Currently, the technology of choice for the systematic and quantitative profiling of proteins in complex biological matrixes is mass spectrometry (MS)-based proteomics, which has emerged as a powerful tool in biomarker discovery [2]. Various studies attempted the identifications of protein biomarkers for lung cancer early diagnosis by proteomics approaches in accessible fluids, such as blood and urines (reviewed in [3,4]). Although several potential blood biomarkers were identified, the majority was not specific for lung cancer, but was rather linked to either inflammation or metabolic alterations, or found to be dysregulated also in other cancer types. In addition, many of the serum proteomics studies conducted so far suffered from the lack of rigorous patient enrolment criteria and of standardized protocols for sample collection, processing and analysis, which may have hindered the discovery of lung cancer specific biomarkers. More promising results were obtained in some recent urine proteomics studies, where biomarkers/signatures distinguishing lung cancer from healthy subject, or from other tumor types, were found [5].
While blood is the most widely used and the most convenient source of patient samples for biomarker discovery, the comprehensive analysis of plasma/serum proteomes is a challenging task [6]. This is due to the high complexity and extreme dynamic range of the proteins present in plasma, as well as the presence of a few proteins at very high con-centrations. The presence of such very abundant proteins masks the detection of less abundant "tissue leakage" proteins that could represent potential biomarkers. Different strategies can be employed to overcome these challenges, including depletion of highly abundant proteins and biochemical fractionation. An alternative promising approach involves the isolation of plasma-derived extracellular vesicles, which are shed from cells all over the body and can be released into the bloodstream [7]. In recent years, there has been a growing interest in the proteomic profiling of extracellular vesicles as the source of biomarkers and as mediators of disease mechanisms [7]. Extracellular vesicles include exosomes, microvesicles (MVs), apoptotic bodies, and apoptotic microparticles, which are characterized by different sizes and cellular origin. MVs are large vesicles (100 nm−1 µm) that protrude from the plasma membrane. They are found not only in blood, but also in urines and other biological fluids. MVs carry a "signature" of the protein content of the cells they originate from and the secretion of MVs from cancer cells contributes to angiogenesis, metastasis, tumor formation, and disease progression [8]. Hence, they are an attractive source of biomarkers.
In this study, we applied proteomics technologies combined with machine learning to identify protein biomarkers/signatures able to distinguish early stage lung cancer patients from healthy subjects, by profiling the proteomes of serum MVs. etc.) and lifestyle information, with special emphasis on tobacco smoke exposure.

MV isolation
Blood samples (at least 15 mL) were collected by standard phlebotomy, discarding the first 3 mL of blood to prevent contamination by skin.
The serum was prepared by leaving the blood in the tubes for 3 h at room temperature to allow blood clotting, followed by centrifugation at 1000 x g for 10 min at room temperature. The serum was removed immediately after centrifugation and stored at −80 • C. For the isolation of MVs, 1 mL of serum was centrifuged at 4000 rpm for 20 min at 4 • C to remove cellular debris. The supernatant was diluted 1:2 in ice-cold PBS and centrifuged at 20,000 x g at 4 • C for 1 h. The pellet was washed twice with ice-cold PBS and centrifuged at 20,000 x g at 4 • C for 1 h.
Although the published microvesicle isolation protocol described only one wash 12 , preliminary tests showed that an additional wash provided cleaner MV preparations (not shown). The microvesicle pellets were stored at −80 • C until use.

Peptide library generation
To generate a reference spectral library for MS acquisition, we selected gel bands that were in-gel digested as previously described [9]. Proteins extracted from the tissue biopsies from the same patients were processed similarly.

Machine learning model development
The other studies [15,16] conducted on the serum of lung cancer patients.
PRKCA is a protein kinase that is involved in the positive and negative regulation of a number of biological processes, by directly phosphorylating targets such as RAF1, BCL2, CSPG4, TNNT2/CTNT, or activating signaling cascades involving MAPK1/3 (ERK1/2) and RAP1GAP [17].
Aberrant high expression of PRKCA levels have been also found in lung adenocarcinoma patients, especially those with Epidermal Growth Factor Receptor (EGFR) mutations [18]. process. Indeed, the algorithm that only elaborate the measure of these two proteins can correctly predict more than 94% of the cases with area under curve > 95% ( Figure 3E).
Of the putative markers found in one of the three predictive models, two (CD59 and PDCD6IP) were also found significantly decreased in the urine proteomes of lung cancer patients compared with normal subjects [5], which serves as an external confirmation of our results for these two proteins ( Figure 4). Interestingly, these two markers were also decreased compared with patients with benign pulmonary diseases (pneumonia and chronic obstructive pulmonary disease), supporting their potential as specific tumor markers.

DISCUSSION
Lung cancer is a latent disease, which is often asymptomatic or associated with non-specific symptoms. As a consequence, it is often diagnosed at a late stage and is associated with poor prognosis.  The same two proteins were analyzed in [5] in a urine dataset composed of 33 healthy subjects, 40 patients with benign pulmonary conditions (pneumonia, and chronic obstructive pulmonary disease) and 33 patients with lung cancer. ***: p < 0.001 by one-way ANOVA followed by Tukey's multiple comparisons test In this study, we profiled the protein composition of 87 human serum samples from healthy, high-risk donors, and lung cancer patient.
To overcome the challenges related with the analysis of serum proteomes, we chose to analyze circulating MVs, identifying a list of candidate markers for future validation. Of note, because no specific step to enrich for tumor-derived MVs was performed, potential biomarkers identified through our approach may derive from the tumor cells, or may originate from other cell populations as a reaction to the growing tumor.
Among the differentially expressed proteins, ARSA showed the highest fold change and discriminant power, and could represent a promising biomarker candidate for the diagnosis of early-stage lung cancer. In addition, to try to further increase the discriminating power, we developed a combinatorial model for cancer early detection, which allows the discrimination of lung cancer patients from control cases with high sensitivity and specificity, and highlights additional candidate markers to be validated. Biochemical assays such as ELISA (enzymelinked immunosorbent assay) could be employed to validate these changes on a large cohort of patients, and to develop a diagnostic tool easily translatable in a clinical setting.
Furthermore, we envision that the current serum protein signature could be combined with other molecular profiles to achieve the highest possible diagnostic power. For instance, the analysis of volatile organic compounds from breath, by either an electronic nose device or gas chromatography-mass spectrometry, has suggested the existence of a specific fingerprint or "breathprint" able to recognize lung cancer from both healthy donors and other pulmonary diseases [19,20].
Cancer volatile organic compounds analysis has been also applied to urine samples, generating promising results [21]. In addition, urine proteomics combined with the development of a machine learning model generated a short list of differentially expressed proteins in lung cancer patients compared with healthy subjects [5]. Interestingly, some of the markers found in our serum analysis (CD59 and PDCD6IP) were also found significantly decreased in the urines of lung cancer patients compared with normal subjects, or patients with benign pulmonary diseases.