20-30% of all proteins in a cell are membrane proteins, i.e. receptors, channels, pores, tranporters. Integral membrane proteins span the lipid bilayer at least once. The identification of potential membrane-spanning segments in a newly identified proteins can be used as a tool to predict whether this protein might be membrane protein. The prediction of transmembrane segments is easier than structure prediction of globular proteins, because
- Amino acids within the transmembrane segment are likely to be hydrophobic (A, V, L, I, F, W, M),
- an a-helical arrangement favored in non-polar medium
- Given the average membrane thickness of 30Å, a-helical transmembrane segments contain ~18-24 residues
Originally, Hydropathy Plots were used to predict membrane proteins. The algorithm used by this prediction program calculates an average hydropathy value for each position in the given sequence. For each position, the algorithm considers the hydropathy index of the surrounding amino acids (within the given window size, centered around that position), and then calculates their average. This average value is the one plotted on the graph for that position.
Hydropathy
Hydropathy indexes (Kyte, 1982):
|
A |
Alanine |
Ala |
1.8 |
|
R |
Arginine |
Arg |
-4.5 |
|
N |
Asparagine |
Asn |
-3.5 |
|
D |
Aspartic Acid |
Asp |
-3.5 |
|
C |
Cysteine |
Cys |
2.5 |
|
Q |
Glutamine |
Gln |
-3.5 |
|
E |
Glutamic Acid |
Glu |
-3.5 |
|
G |
Glycine |
Gly |
-0.4 |
|
H |
Histadine |
His |
-3.2 |
|
I |
Isoleucine |
Ile |
4.5 |
|
L |
Leucine |
Leu |
3.8 |
|
K |
Lysine |
Lys |
-3.9 |
|
M |
Methionine |
Met |
1.9 |
|
F |
Phenylalanine |
Phe |
2.8 |
|
P |
Proline |
Pro |
-1.6 |
|
S |
Serine |
Ser |
-0.8 |
|
T |
Threonine |
Thr |
-0.7 |
|
W |
Tryptophan |
Trp |
-0.9 |
|
Y |
Tyrosine |
Tyr |
-1.3 |
|
V |
Valine |
Val |
4.2 |
|
B |
Asparagine or Aspartic Acid |
Asn or Asp |
-3.5 |
|
Z |
Glutamine or Glutamic Acid |
Gln or Glu |
-3.5 |
A commonly used hydrophathy prediction method was developed by Kyte and Doolittle (Reference: Kyte, Jack, and Russel F. Doolittle. "A Simple Method for Displaying the Hydropathic Character of a Protein." Journal of Molecular Biology 1982; (157) 105-132.)
Example: Obtain the Fasta format of the protein sequence for the human serotonin transporter (accession number P31645) from the NCBI database (http://www.ncbi.nlm.nih.gov). Open web site http://fasta.bioch.virginia.edu/fasta/grease.htm. Paste in the sequence and start the plot using the default window size of 9.
The plot produced by this program represents the average hydropathy along the amino acid sequence. The plot may help predict whether or not the protein segment has enough hydrophobicity to either interact with or reside in a membrane.
Window size. Recommended window sizes are odd integers between 5 and 29. The default window is 9. The window size sets the number of positions that are averaged for each point on the plot. That means that larger window sizes make smoother plots. If your plot doesn't have any distinct peaks, you might try a smaller window size. If your plot has many peaks very close to each other, you might try a larger window size. The window size should be odd so that the window is not centered between two sequence positions.
Can you predict how many transmembrane segments the serotonin transporter might have? Try increasing the window size to obtain a smoother plot and make it easier to identify true transmembrane regions.
Newer method for the prediction of transmembrane helices
Several other methods have been developed to more accurately predict transmembrane helices using algorithms that do not just take hydrophobicity of protein segments into account. Analysis of transmembrane segments in several membrane proteins revealed that these segments have a more complex structure. There are three major regions in a transmembrane helix:
- the helix core (the central part of the helix), which contains mainly hydrophobic residues and interacts with the fatty acid side chains of the phospholipids in the membrane
- the cap region (on either side of the core), which often contains often polar aromatic residues, and interacts with the phosphate head groups of the membrane phospholipids
- the loops on either side outside the membrane, which often contain charged residues,
A transmembrane helix can be oriented either way, meaning that the N-terminal loop of a transmembrane helix can be located in the cytoplasm (inside-out orientation) or in the extracellular space (outside-in orientation). The actual arrangement in the native protein is referred to as the topology of the transmembrane helix. The topology is determined by the distribution of charged residues in the loops. Cytoplasmic loops contain more positively charged residues than extracellular loops, this has been recognized by Gunnar von Heijne and his co-workers and is known as the “positive-inside rule”.
The knowledge of the structure of transmembrane segments, as well as the statistical evaluation of known transmembrane helices and the conservation of such helices within a protein family have been used to develop programs for membrane protein prediction.
A selection of these programs is described below.
Try and predict the number and position and orientation (topology) of transmembrane segments in the serotonin transporter with each of the programs below. What is the localization of the N-terminus and the C-terminus, i.e. cytoplasmic (inside) or extracellular (outside)? Which prediction(s) do you think is/are more likely to be correct, i.e. might reflect the actual number and arrangement of transmembrane segments in this protein? Give reasons for your choice.
Notes:
1. For you practical write-up, it is best to copy/paste the serotonin transporter sequence into a Word file and then indicate for each prediction the exact position of each transmembrane segment (electronically by using lines etc. or by hand after printing out your document) and then compare.
2. The structure of the serotonin transporter has not been solved yet, so we actually don’t know which prediction is correct. We will have to wait and see…
A) The TMpred program (http://www.ch.embnet.org/software/TMPRED_form.html) makes a prediction of membrane-spanning regions and their orientation. The algorithm is based on the statistical analysis of TMbase, a database of naturally occuring transmembrane proteins.
Copy/Paste your sequence (plain text, no spaces, no headers etc.) into the form provided, do not change any settings, run the prediction. In the output file you will find results for different stages of the prediction. Under 3.) the final models for your protein are given, with the most likely one being listed first. For each potential transmembrane segment the amino acid position are given, as well as the predicted orientation of the helix, i.e. inside-out (i-o) or outside-in (o-i).
B) The TMHMM program (http://www.cbs.dtu.dk/services/TMHMM-2.0/ ) uses a pattern recognition methods for transmembrane segment and topology prediction, i.e. this program takes into account the amino acid distribution pattern in the different regions of a transmembrane helix (see above).
Copy/Paste your sequence (in FASTA format!) into the form provided, do not change any settings, run the prediction. The output will give you a list of all transmembrane segments found and the most likely location of termini and loops as well as a graphical plot of the result. Use the HELP link for more details on the interpretation of the results.
[Note: submission to this server is limited, thus some students may not be able to use this prediction programme during the practical. If possible try again at on another day]
C) The TMAP program (http://www.mbb.ki.se/tmap/ ) is optimized for prediction of transmembrane helices from multiple sequence alignments, based on the assumption, that a transmembrane segment can be more reliably predicted when it is conserved within a protein family.
To use this program you need to perform a multiple sequence alignment (see previous practical) of the protein in question, i.e. the serotonin transporter, with related proteins, i.e. other neurotransmitter transporters. Generate the multiple sequence alignment using the following steps:
- Copy/paste the neurotransmitter transporter sequences here into the submission form of the multiple sequence alignment tool ClustalW (http://www.ebi.ac.uk/clustalw/index.html). Change the following settings:
Select output format “gcg MSF”
Select output order “input”
Run the sequence alignment.
- Copy/paste the multiple sequence alignment output from ClustalW (everything from “PileUp” to end of the sequence alignment) into the TMAP (http://www.mbb.ki.se/tmap/) submission form. Run prediction.
- The output file will give you the topology, either “Nin” (means N-terminus in cytoplasm) or “Nout” (means N-terminus extracellularly), then the position of transmembrane helices for the entire alignment (i.e. PREDICTED TRANSMEMBRANE SEGMENTS FOR ALIGNMENT), and below that the amino acid position for each protein in the alignment (i.e. PREDICTED TRANSMEMBRANE SEGMENTS FOR PROTEIN, followed by a short description and the accession number). The serotonin transporter sequence should be the first one (check the accession number).
- Try to run the same prediction program in the “single sequence mode” (http://130.237.130.31/tmap/single.html) with the serotonin transporter sequence only as the input. Compare the results with the “multiple sequence alignment mode”.
Exercise:
Which of the following proteins are integral membrane proteins? If yes, how many transmembrane helices can you identify? Give amino acid residue positions. Use the TMpred programme: http://www.ch.embnet.org/software/TMPRED_form.html. (Note: Make sure you paste the sequence in plain text, i.e. no spaces, no headers etc.)
5HT1A receptor P08908
GAPDH P04797
Aquaporin 2 P41181
Tryptophan hydroxylase 1 P17752
In order to learn something about the possible function of a protein, in particular of a newly identified protein it is useful to predict the subcellular localization of this protein. The prediction of cellular localization relies largely on the present of cleavable signal sequences for example in secreted proteins or protein targeted to the mitochondrium and on the identification of so-called “sorting signals”, i.e. sequence motifs, that determine the targeting to or retention in particular organelles of a cell. Using the PSORT program (http://psort.ims.u-tokyo.ac.jp/form2.html, follow the link for PSORT II and then click on PSORT II prediction), try to predict the cellular localization of the serotonin transporter and other proteins listed below (for sequences click here). To learn more about the individual subprograms simple click on to the links to obtain background information, such as prediction algorithm, particular motifs etc.
Note: Make sure you paste in the protein sequences as plain text (no spaces, headers etc.).
Serotonin transporter P31645
DNA topoisomerase Q02880
Calreticulin P42918
TIM44 O35094
Where in the cell is the most likely localization of each of the proteins above? Which motifs can you identify? Does the predicted localization seem logical when considering the function of each protein (look-up the documentation for each protein in the NCBI database, search with the given accession numbers)?