Protein Homology Modelling and Structural Genomics

prote in h om ol og y m od e l l in g n.w
1 / 48
Embed
Share

Explore the world of protein homology modelling, ab initio structure prediction, and structural genomics to understand the intricacies of protein structure determination and its significance in biology. Learn about the steps involved, pitfalls to avoid, and the latest advancements in the field.

  • Protein Modelling
  • Structural Genomics
  • Protein Structure
  • Homology Modelling
  • Ab Initio Prediction

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. PROTE IN H OM OL OG Y M OD E L L IN G

  2. After this lesson you should be able to: Explain the individual steps involved in calculating a protein homology model. Identify suitable templates for modelling. Outline the principles behind ab initio protein structure prediction. Describe the differences between homology modelling and ab initio structure prediction. Describe the major pitfalls in protein modelling. LEARNING OBJECTIVES

  3. Protein homology modelling Individual steps Caveats Pitfalls Ab initio protein structure prediction Threading True ab initio methods OUTLINE

  4. Why Do We Need Homology Modelling? Ab Initioprotein folding ( random sampling): 100 aa, 3 conf./residue gives approximately 1048 different overall conformations! Random sampling is NOT feasible, even if conformations can be sampled at picosecond (10-12 sec) rates. Levinthal s paradox Do homology modelling instead.

  5. The structure of a protein is uniquely determined by its amino acid sequence (but sequence is sometimes not enough): prions pH, ions, cofactors, chaperones Structure is conserved much longer than sequence in evolution. Structure > Function >> Sequence HOW IS IT POSSIBLE?

  6. There are currently ~47000 structures in the PDB (but only ~4000 if you include only ones that are not more than 30% identical and have a resolution better than 3.0 ). An estimated 25% of all sequences can be modeled and structural information can be obtained for ~50%. HOW OFTEN CAN WE DO IT?

  7. Complete genomes Signaling proteins Disease-causing organisms Model organisms Membrane proteins Protein-ligand interactions WORLDWIDE STRUCTURAL GENOMICS Fold space coverage

  8. 10 year $600 million project initiated in 2000, funded largely by NIH. AIM: structural information on 10000 unique proteins (now 4-6000), so far 1000 have been determined. Improve current techniques to reduce time (from months to days) and cost (from $100.000 to $20.000/structure). 9 research centers currently funded (2005), targets are from model and disease-causing organisms (a separate project on TB proteins). STRUCTURAL GENOMICS IN NORTH AMERICA

  9. HOMOLOGY MODELING FOR STRUCTURAL GENOMICS Roberto S nchez et al. Nature Structural Biology 7, 986 - 990 (2000)

  10. Sali, A. & Kuriyan, J. Trends Biochem. Sci.22, M20 M24 (1999) HOW WELL CAN WE DO IT?

  11. Identify template(s) initial alignment Improve alignment Backbone generation Loop modelling Side chains Refinement Validation HOW IS IT DONE?

  12. Search with sequence Blast Psi-Blast Fold recognition methods Use biological information Functional annotation in databases Active site/motifs TEMPLATE IDENTIFICATION

  13. ALIGNMENT

  14. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS F 6 -3 2 0 D -2 0 C -3 -2 2 -2 -2 0 -2 2 -2 -1 2 I R P -2 -3 -1 -2 -3 -2 0 -2 -2 0 2 -1 -1 -1 0 L G S A E A V C -3 F N V C R T P E A I C 0 1 -1 0 0 -2 -2 5 -2 -2 -2 -2 -2 5 -1 0 0 1 -1 0 -1 -1 -2 -3 2 -2 0 0 -3 -2 -2 8 -2 -3 0 -1 -2 -1 -1 1 -3 5 -2 -2 2 -2 1 0 0 1 1 1 5 5 1 1 5 -1 -3 0 -2 -2 -2 -2 -1 -1 -2 -1 2 -2 -3 -3 -2 -1 -2 -3 -2 -2 8

  15. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS F 6 -3 2 0 -3 -2 -2 8 -2 -2 -2 -2 5 -2 0 0 -2 0 -2 -3 0 -3 2 -2 -3 0 -2 0 -1 -2 -1 -1 1 0 -3 5 -2 -2 2 -3 -2 -2 8 D -2 0 C -3 -2 2 -2 -2 0 -2 2 -2 -1 2 I P -2 -3 -1 -2 -3 -2 0 -2 -2 0 2 -1 -1 -1 0 -2 -3 -3 -2 -1 -2 -3 -2 -2 8 -1 0 0 1 -1 0 0 0 -1 2 -2 8 0 0 -2 1 0 1 0 1 -2 -2 -1 -1 -2 -1 2 -2 -3 -3 -2 -1 -2 -3 -2 -2 8 R L G S A E A V C -3 F N V C R T P E A I C 0 1 -1 0 0 -2 -2 5 -2 -1 0 0 1 1 5 -1 -1 -2 0 0 1 -1 -3 1 -1 -3 5 0 1 1 5 1 -1 -2 -2

  16. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS IMPROVING THE ALIGNMENT From Professional Gambling by Gert Vriend http://www.cmbi.kun.nl/gv/articles/text/gambling.html

  17. Selecting the best template is crucial! The best template may not be the one with the highest % id (best p-value ) Template 1: 93% id, 3.5 resolution Template 2: 90% id, 1.5 resolution TEMPLATE QUALITY

  18. The Importance of Resolution 4 low 3 2 high 1

  19. What regions in the structure are most well-defined? Look at the pdb ensembles to see which regions are well-defined EVALUATION OF NMR STRUCTURES 1RJH Nielbo et al, Biochemistry, 2003

  20. Ramachandran Plot Allowed backbone torsion angles in proteins N H Amino acid residue

  21. TEMPLATE QUALITY RAMACHANDRAN PLOT X-ray structure good data. NMR structure low quality data

  22. Generate the backbone coordinates from the template for the aligned regions. Several programs can do this, most of the groups at CASP6 use Modeller: http://salilab.org/modeller/modeller.html BACKBONE GENERATION

  23. Knowledge based: Searches PDB for fragments that match the sequence to be modelled (Levitt, Holm, Baker etc.). Energy based: Uses an energy function to evaluate the quality of the loop and minimizes this function by Monte Carlo (sampling) or molecular dynamics (MD) techniques. Combination LOOP MODELLING

  24. Find fragments (10 per amino acid) with the same sequence and secondary structure profile as the query sequence. Combine them using a Monte Carlo scheme to build the loop. David Baker et al. LOOPS THE ROSETTA METHOD

  25. If the seq. ID is high, the networks of side chain contacts may be conserved, and keeping the side chain rotamers from the template may be better than predicting new ones. SIDE CHAINS

  26. Side chain rotamers are dependent on backbone conformation. Most successful method in CASP6 was SCWRL by Dunbrack et al.: Graph-theory knowledge based method to solve the combinatorial problem of side chain modelling. http://dunbrack.fccc.edu/SCWRL3.php PREDICTING SIDE CHAIN CONFORMATIONS

  27. Prediction accuracy is high for buried residues, but much lower for surface residues Experimental reasons: side chains at the surface are more flexible. Theoretical reasons: much easier to handle hydrophobic packing in the core than the electrostatic interactions, including H-bonds to waters. SIDE CHAINS - ACCURACY

  28. Energy minimization Molecular dynamics Big errors like atom clashes can be removed, but force fields are not perfect and small errors will also be introduced keep minimization to a minimum or matters will only get worse. REFINEMENT

  29. If errors are introduced in the model, they normally can NOT be recovered at a later step The alignment can not make up for a bad choice of template. Loop modeling can not make up for a poor alignment. If errors are discovered, the step where they were introduced should be redone. ERROR RECOVERY

  30. Most programs will get the bond lengths and angles right. The Ramachandran plot of the model usually looks pretty much like the Ramachandran plot of the template (so select a high quality template). Inside/outside distributions of polar and apolar residues can be useful. Biological/biochemical data Active site residues Modification sites Interaction sites VALIDATION

  31. ProQ is a neural network based predictor that based on a number of structural features predicts the quality of a protein model. ProQ is optimized to find correct models in contrast to other methods which are optimized to find native structures. VALIDATION PROQ SERVER Arne Elofssons group: http://www.sbc.su.se/~bjorn/ProQ/

  32. ProCheck http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html WhatIf server http://swift.cmbi.kun.nl/WIWWWI/ STRUCTURE VALIDATION

  33. Eva-CM performs continuous and automated analysis of comparative protein structure modeling servers A current list of the best performing servers can be found at: http://cubic.bioc.columbia.edu/eva/doc/intro_cm.html HOMOLOGY MODELLING SERVERS

  34. Successful homology modelling depends on the following: Template quality Alignment (add biological information) Modelling program/procedure (use more than one) Always validate your final model! SUMMARY HOMOLOGY MODELLING

  35. FOLD RECOGNITION AND AB INITIO PROTEIN STRUCTURE PREDICTION

  36. Threading and pair potentials Ab initio methods Human intervention (what kind of knowledge can be used for alignment and selection of templates?) Meta-servers (the principle, 3d jury) Summary of take-home messages OUTLINE

  37. Example: Pair potentials How normal is it to observe a pair of an alanine and a valine separated by 20 residues in the sequence and 3 in space? (X) Compares a given sequence against known structures (folds). By using potentials that describe tendencies observed in known protein structures. How normal is it to observe any pair of residues separated by 20 residues and 3 in space? (Y) THREADING AND PAIR POTENTIALS Potential: log (X/Y)

  38. Alignment score from structural fitness (pair potential) Deletions 7 4 6 2 5 8 10 9 3 1 How well does K fit environment at P6? If P8 is acidic then fine, if P8 is basic then poor .. A T N L Y K E T L .. POTENTIALS OF MEAN FORCE

  39. Problem: No protein is average Interactions in proteins cannot only be described by pairs of amino acids The information in the potentials is partly captured with sequence profiles Today mostly used in HYBRID approaches in combination with profile-profile based methods Potentials can be used to score models based on different templates or alignments THREADING METHODS TODAY

  40. Aim is to find the fold of native protein by simulating the biological process of protein folding. A VERY DIFFICULT task because a protein chain can fold into millions of different conformations. Use it only when no detectable homologues are available. Methods can also be useful for fold recognition in cases of extremely low homology (e.g. convergent evolution). AB INITIO METHODS

  41. Rosetta method of the Baker group: Submit sequence to a number of secondary structure predictors. Compare fragments of 3 and 9 residues to library from know structures. Link fragments together. FRAGMENT-BASED AB INITIO MODELLING Use energy minimization techniques (Monte Carlo optimization) to calculate tertiary structure.

  42. Use of energy potentials for scoring and computing models. Potentials should make models more native-like . These can be based on contact potentials, solvation potentials, Van der Waals repulsion and attractive forces, hydrogen bond potentials. Globularity/radius of gyration (ab initio). POTENTIALS FOR FINDING GOOD MODELS

  43. Fragments with correct local structure PROBLEMS WITH EMPIRICAL POTENTIALS

  44. Knowledge of function The best methods use maximum knowledge of query proteins. Cysteines forming disulfide bridges or binding e.g. zinc molecules Proteolytic cleavage sites Other metal binding residues Antibody epitopes or escape mutants Ligand binding Specialists can help to find a correct Results from CD or fluorescence experiments HUMAN INTERVENTION template and correct alignments. Knowledge of secondary structure

  45. Democratic modeling The highest score hit is often wrong. Many prediction methods have the correct fold among the top 10- 20 hits. If many different prediction methods all have some fold among the top hits, this fold is probably correct. META-SERVERS

  46. 3DJury http://bioinfo.pl/meta/ Inspired by Ab initio modeling methods Average of frequently obtained low energy structures is often closer to the native structure than the lowest energy structure Find most abundant high scoring model in a list of prediction from several predictors Use output from a set of servers 1. Superimpose all pairs of structures 2. Similarity score based on # of C pairs within 3.5 3. Similar methods developed by A. Elofsson (Pcons) and D. Fischer (3D shotgun). EXAMPLE OF A META-SERVER

  47. Because it is a meta-server it can be slow. If queue is too long some servers are skipped. Output is only C coordinates. What to do with the rest of the structure? Use e.g. maxsprout server to build sidechains and backbone atoms. http://www.ebi.ac.uk/maxsprout/ 3DJURY

  48. Hybrid methods using both threading methods and profile-profile alignments are the best. Use only Ab initio methods if necessary and know that the quality is really low! Try to use as much knowledge as possible for alignment and template selections in difficult cases. Use meta-servers when you can. SUMMARY AB INITIO METHODS

Related


More Related Content