The lung cancer proteome


Lung cancer is the most prevalent cancer in the world and the leading cause of cancer-related deaths. Smoking is accepted as the major risk factor, responsible for 70-90% of all lung cancer cases, although the etiology of lung cancer appears multifactorial with both environmental and genetic factors playing a role. Lung cancer patients have a poor outcome with a 5-year survival rate of 13.6% among men and 19.4% among women across all stages. The poor prognosis is partly explained by late diagnosis, but also by lack of effective treatments.

Based on histology, lung cancer is primarily divided into small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC). SCLC originates from neuroendocrine cells and accounts for approximately 15% of all primary lung cancers. This extremely rapidly proliferating cancer is generally treated with chemotherapy with initial good response which unfortunately in most cases is followed by resistance to treatment and poor survival outcome.

NSCLC is suggested to originate from bronchogenic or alveolar cells. It is the most common form of primary lung cancer and represents approximately 80-85% of all lung cancer cases. Based on histology, NSCLC can further be divided into different subtypes, with adenocarcinoma and squamous cell carcinoma being most common. Treatment for NSCLC is mainly based on the tumor extent. In principle, limited stage tumors are surgically treated, sometimes with the addition of chemotherapy and radiotherapy whereas tumors with advanced stages are palliatively treated with a combination of cytotoxic drugs and recently developed targeted drugs. Unfortunately, the treatment effect is limited and the majority of patients experience only modest survival prolongation.

Here, we explore the lung cancer proteome using TCGA transcriptomics data and antibody based protein data. 650 genes are suggested as prognostic based on transcriptomics data from 994 patients; 354 genes associated with unfavourable prognosis and 296 genes associated with favourable prognosis.

TCGA data analysis


In this metadata study we used data from TCGA where transcriptomics data was available from 994 patients in total, 494 patients with squamous cell carcinoma (LUSC) and 500 patients with adenocarcinoma (LUAD). The total dataset included 398 females and 596 males. Most of the patients (600 patients) were still alive at the time of data collection. The stage distribution was stage i) 510 patients, stage ii) 277 patients, stage iii) 163 patients, stage iv) 32 patients and 12 patients with missing stage information.

Unfavourable prognostic genes in lung cancer


For unfavourable genes, higher relative expression levels at diagnosis gives significantly lower overall survival for the patients. There are 354 genes associated with unfavourable prognosis in lung cancer. In Table 1, the top 20 most significant genes related to unfavourable prognosis are listed.

S100A16 is a gene associated with unfavourable prognosis in lung cancer. The best separation is achieved by an expression cutoff at 118.4 fpkm which divides the patients into two groups with 43% 5-year survival for patients with high expression versus 46% for patients with low expression, p-value: 3.14e-5. A survival analysis in the different subtypes showed significant association only in LUAD. Immunohistochemical staining using an antibody targeting S100A16 (HPA045841) shows differential expression pattern in lung cancer samples.

S100A16 - survival analysis p<0.001
S100A16 - high expression
S100A16 - low expression

ANLN is another gene associated with unfavourable prognosis in lung cancer. The best separation is achieved by an expression cutoff at 5.8 fpkm which divides the patients into two groups with 42% 5-year survival for patients with high expression versus 49% for patients with low expression, p-value: 6.99e-5. A survival analysis in the different subtypes showed significant association only in LUAD. Immunohistochemical staining using an antibody targeting ANLN (CAB062547) shows differential expression pattern in lung cancer samples.

ANLN - survival analysis p<0.001
ANLN - high expression
ANLN - low expression

Table 1. The 20 genes with highest significance associated with unfavourable prognosis in lung cancer.

Gene

Description

Predicted localization

mRNA (cancer)

p-value

FAM83A family with sequence similarity 83, member A Intracellular 20.0 2.98e-9
GALNT2 polypeptide N-acetylgalactosaminyltransferase 2 Secreted 22.2 8.48e-8
LOXL2 lysyl oxidase-like 2 Intracellular,Secreted 11.4 1.46e-7
FSTL3 follistatin-like 3 (secreted glycoprotein) Intracellular,Secreted 13.1 4.61e-7
BCAR3 breast cancer anti-estrogen resistance 3 Intracellular 5.1 8.92e-7
Show more

Favourable prognostic genes in lung cancer


For favourable genes, higher relative expression levels at diagnosis gives significantly higher overall survival for the patients. There are 296 genes associated with favourable prognosis in lung cancer. In Table 2, the top 20 most significant genes related to favourable prognosis are listed.

MPC1 is a gene associated with favourable prognosis in lung cancer. The best separation is achieved by an expression cutoff at 18.0 fpkm which divides the patients into two groups with 53% 5-year survival for patients with high expression versus 40% for patients with low expression, p-value: 1.29e-4. A survival analysis in the different subtypes showed significant association only in LUAD. Immunohistochemical staining using an antibody targeting MPC1 (HPA045119) shows differential expression pattern in lung cancer samples.

MPC1 - survival analysis p<0.001
MPC1 - high expression
MPC1 - low expression

NFIX is another gene associated with favourable prognosis in lung cancer. The best separation is achieved by an expression cutoff at 11.1 fpkm which divides the patients into two groups with 52% 5-year survival for patients with high expression versus 39% for patients with low expression, p-value: 2.35e-4. A survival analysis in the different subtypes showed significant association only in LUAD. Immunohistochemical staining using an antibody targeting NFIX (HPA007533) shows differential expression pattern in lung cancer samples.

NFIX - survival analysis p<0.001
NFIX - high expression
NFIX - low expression

Table 2. The 20 genes with highest significance associated with favourable prognosis in lung cancer.

Gene

Description

Predicted localization

mRNA (cancer)

p-value

ZNF512 zinc finger protein 512 Intracellular 6.2 5.08e-9
MOAP1 modulator of apoptosis 1 Intracellular 12.2 7.37e-9
HLF hepatic leukemia factor Intracellular 2.7 2.71e-7
FAM117A family with sequence similarity 117, member A Intracellular 6.0 8.25e-7
SLC11A2 solute carrier family 11 (proton-coupled divalent metal ion transporter), member 2 Intracellular,Membrane 8.5 3.07e-6
Show more

The lung cancer transcriptome


The transcriptome analysis shows that 74% (n=14433) of all human genes (n=19571) are expressed in lung cancer. All genes were classified according to the lung cancer-specific expression into one of five different categories, based on the ratio between mRNA levels in lung cancer compared to the mRNA levels in the other 16 analyzed cancer tissues. 101 genes show some level of elevated expression in lung cancer compared to other cancers (Figure 1). The elevated category is further subdivided into three categories as shown in Table 3.

Figure 1. The distribution of all genes across the five categories based on transcript abundance in lung cancer as well as in all other cancer tissues.

Table 3. Number of genes in the subdivided categories of elevated expression in lung cancer

Category

Number of genes

Description

Tissue enriched 13 At least five-fold higher mRNA levels in a particular cancer as compared to all other cancers
Group enriched 74 At least five-fold higher mRNA levels in a group of 2-7 cancers
Tissue enhanced 14 At least five-fold higher mRNA levels in a particular cancer as compared to average levels in all cancers
Total 101 Total number of elevated genes in lung cancer

Additional information


The histological classification of NSCLC is important for treatment options. The most common subtype of NSCLC is adenocarcinoma, comprising around 40% of all lung cancers. Adenocarcinoma is characterized by glandular formation, production of mucin and expression of thyroid transcription factor-1. It is the predominant histological type among younger men, women of all ages and and in former and never smokers.

Squamous cell carcinoma, the second most common subtype of NSCLC, is suggested to originate from metaplastic squamous epithelia in the bronchial tree. It is defined by a variable degree of squamous differentiation, such as keratinization or intercellular bridging. This subtype is strongly associated with cigarette smoking. Large cell carcinoma accounts for 5-10% of all lung cancers and is heterogenous group with no evidence of squamous or adenocarcinoma differentiation.

In addition to these three main subtypes of NSCLC, other less common e.g adenosquamous carcinoma and sarcomatoid carcinoma comprise the remaining NSCLC cases.

Relevant links and publications


Uhlen M et al, 2017. A pathology atlas of the human cancer transcriptome. Science.
PubMed: 28818916 DOI: 10.1126/science.aan2507

Cancer Genome Atlas Research Network et al, 2013. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet.
PubMed: 24071849 DOI: 10.1038/ng.2764

Uhlén M et al, 2015. Tissue-based map of the human proteome. Science
PubMed: 25613900 DOI: 10.1126/science.1260419

Lindskog C et al, 2014. The lung-specific proteome defined by integration of transcriptomics and antibody-based profiling. FASEB J.
PubMed: 25169055 DOI: 10.1096/fj.14-254862

Histology dictionary - Lung cancer