Secreted proteins


A secretory protein can be defined as a protein that is actively transported out of the cell. In humans, cells such as endocrine cells and B-lymphocytes are specialized in protein secretion, but all cells secrete proteins to a certain extent. Proteins that are secreted from the cell play a crucial role in many physiological, developmental and pathological processes and are important for both intercellular and intracellular communication. In addition to being a rich source of new therapeutics and drug targets, a large fraction of the blood diagnostic tests used in the clinic are directed towards secreted proteins, emphasizing the importance of this class of proteins for medicine and biology. Medically important secreted proteins include cytokines, coagulation factors, growth factors and other signaling molecules. We predict 2918 proteins, or 15% of the human proteome, to be secreted based on results from multiple prediction methods.

Function of the secretory pathway


The most common secretion pathway is the secretory pathway (Figure 1). Newly synthetized proteins are transported from the endoplasmic reticulum (ER), passing the Golgi apparatus and packed into vesicles. The vesicles are then transported to the plasma membrane. Vesicles and plasma membrane merge and thereby releasing proteins into the extracellular space (exocytosis). The signal sequence that target proteins for secretion to the ER is called a signal peptide (SP) and consists of a short, hydrophobic N-terminal sequence, which is inserted into the ER membrane and subsequently cleaved off from the protein (von Heijne G. 1985). Membrane proteins may also contain a SP, but most often the N-terminal transmembrane (TM) region functions as the signal sequence. The signal sequences are recognized by chaperone proteins that guide the synthesizing ribosomes to the rough ER where a co-translational translocation of the protein sequence occurs in a protein complex named the translocon (Johnson AE et al, 1999). Membrane proteins are transferred to the lipid bilayer of the ER membrane via the translocon whereas secretory proteins are transported into the ER lumen. The proteins that pass the ER quality control are transported via vesicles to the Golgi apparatus, where they are further modified and sorted for transport to their final destination, which most often is the plasma membrane, lysosomes or secretion out from the cell.

Figure 1. Overview of the secretory pathway.

The function of secreted proteins is diverse and range from exocrine secretion including enzymes in the digestive tract to endocrine secretion including insulin and other hormones released into the blood stream. Signalling between or within cells via secreted signalling molecules can be paracrine, autocrine, endocrine or neuroendocrine (Nussey S et al, 2001). Among the most important signalling proteins are cytokines, kinases, hormones and growth factors that bind to receptors on the surface of target cells (Farhan H et al, 2011). A large fraction of the clinically approved treatment regimens today use drugs directed towards (or consisting of) secreted proteins or cell surface-associated membrane proteins. Out of the 646 protein targets with known pharmacological action for approved drugs on the market at present (Wishart DS et al, 2006), 157 are predicted to be secreted.

Secreted proteins are often enriched in the organelles of the secretory pathway (ER, Golgi apparatus, vesicles), before they are released to the extracellular matrix. This enables a detection of the protein by IF, although their final destination lies outside of the cell. In Figure 2, IF images of three predicted secreted proteins are shown.

CHGB - SH-SY5Y
SCG3 - SH-SY5Y
NPY - SH-SY5Y

Figure 2. Examples of three different predicted secreted proteins are shown in the neuron-like SH-SY5Y cell line: CHGB and SCG3 are found in secretory vesicles, while NPY is enriched in the Golgi apparatus.

Prediction of secreted proteins


Secreted proteins can often be identified based on their SPs, which have a number of features suitable for computational prediction models. The SP is typically 15-30 amino acids long and primarily recognized by a short hydrophobic and mostly positive N-terminal alpha-helix (n-region) combined with a hydrophobic h-region and a C-terminal polar uncharged c-region (PMID:17446895). There are many algorithms which use these features to predict the location of a SP in a protein, and there are also a number of methods which incorporate a SP prediction model into their transmembrane (TM) topology prediction algorithm to allow for more reliable results when it comes to distinguishing an SP and a TM segment.

The human 'secretome' can be defined as all genes encoding at least one secreted protein and has been analyzed here by performing a whole-proteome scan using three methods for signal peptide prediction: SignalP4.0 (Petersen TN et al, 2011), Phobius (Käll L et al, 2004) and SPOCTOPUS (Viklund H et al, 2008), which all have been shown to give reliable prediction results in comparative analyses. A majority decision-based method (MDSEC) has been constructed using the results from the three different SP prediction methods to obtain a list of predicted secreted proteins (Uhlén M et al, 2015.). All proteins with a predicted SP by at least two of the three methods are considered secreted. Since signal peptides are found both in secreted proteins and in certain types of membrane proteins, the results were filtered using the majority decision-based method (MDM) for membrane protein topology prediction (Fagerberg L et al, 2010). All proteins with a predicted SP in combination with a predicted TM region according to the MDM are considered membrane-spanning and therefore not secreted. The resulting numbers of genes encoding a predicted secreted protein based on the three methods as well as the new majority-decision based method are shown in Table 1. The resulting lists of predicted secreted proteins as well as predicted membrane proteins were used as a classification of the human proteome.

Method Number genes Source
Secreted proteins predicted by MDSEC 2918 HPA
SignalP predicted secreted proteins 2504 SignalP
Phobius predicted secreted proteins 3304 Phobius
SPOCTOPUS predicted secreted proteins 3705 SPOCTOPUS

Table 1. Prediction of the human secretome by three different prediction methods for signal peptides as well as the MDSEC.

Expression levels of secreted proteins in tissue


An analysis of the tissue categories based on RNA-seq data shows that a large fraction of the secreted proteins belongs to the tissue- or group enriched genes, which are expressed at a higher level in a single or smaller group of tissues (Uhlén M et al, 2015) (Figure 3). The secreted class contains many of the most abundantly expressed genes and the highest expression levels of secreted proteins are found in pancreas and salivary gland.

Figure 3. Bar plot showing the distribution of expression categories, based on the gene expression in tissues, for genes coding for predicted secreted proteins compared to all genes in the Cell Atlas. Asterisk marks statistically significant deviation(s) (p≤0.05) from all other genes based on a binomial statistical test.

Relevant links and publications


Emanuelsson O et al, 2007. Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc.
PubMed: 17446895 DOI: 10.1038/nprot.2007.131

Fagerberg L et al, 2010. Prediction of the human membrane proteome. Proteomics.
PubMed: 20175080 DOI: 10.1002/pmic.200900258

Farhan H et al, 2011. Signalling to and from the secretory pathway. J Cell Sci.
PubMed: 21187344 DOI: 10.1242/jcs.076455

Johnson AE et al, 1999. The translocon: a dynamic gateway at the ER membrane. Annu Rev Cell Dev Biol.
PubMed: 10611978 DOI: 10.1146/annurev.cellbio.15.1.799

Käll L et al, 2004. A combined transmembrane topology and signal peptide prediction method. J Mol Biol.
PubMed: 15111065 DOI: 10.1016/j.jmb.2004.03.016

Nussey S et al, 2001. Endocrinology: An Integrated Approach. Oxford: BIOS Scientific Publishers.

Petersen TN et al, 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods.
PubMed: 21959131 DOI: 10.1038/nmeth.1701

Uhlén M et al, 2015. Tissue-based map of the human proteome. Science
PubMed: 25613900 DOI: 10.1126/science.1260419

Viklund H et al, 2008. SPOCTOPUS: a combined predictor of signal peptides and membrane protein topology. Bioinformatics.
PubMed: 18945683 DOI: 10.1093/bioinformatics/btn550

von Heijne G. 1985. Signal sequences. The limits of variation. J Mol Biol.
PubMed: 4032478 

Wishart DS et al, 2006. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res.
PubMed: 16381955 DOI: 10.1093/nar/gkj067