Protein Function Prediction and Analysis in Biological Systems

farewell talk n.w
1 / 42
Embed
Share

Explore the world of protein function prediction through interactions and data analysis. From gene ontology to guilt by association, delve into the intricate network of biological processes and molecular functions. Discover how conditional probabilities enhance functional flow in protein networks.

  • Protein Function
  • Biological Systems
  • Data Analysis
  • Gene Ontology
  • Functional Flow

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Farewell Talk Wyatt T. Clark 1

  2. Overview Predicting Function from protein interaction data Assigning prior probabilities to regions of the genome using recombination rates Somatic disease mutations Germline disease mutations 2

  3. Gene Ontology Standardizes Function Vocabulary Biological Process (BPO) Molecular Function (MFO) Cellular Component (CCO) 3

  4. Protein Protein Interactions (PPI) Set of Vertices and Edges, Function Annotations p2 Vertices are proteins p3 p1 Unordered edges p4 Functions denoted as set membership

  5. Guilt by Association {F7,F8, F10} {F7, F10} {F7, F10} {F7,F11} {F7} {F7,F9, F10} {F7, F10} Schwikowski et al. (2000)

  6. Functional Flow Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , Nabieva et al. (2005) Diffuses function annotations through protein- protein interaction network 6

  7. Functional Flow Iteration = 0 ? ? 0 0 ? 0 F1 ? ? 0 0 ? ? 0 0 F1 Nabieva et al. (2005) 7

  8. Functional Flow Iteration = 1 0 0 0 0 0 0 0 Nabieva et al. (2005)

  9. Functional Flow Iteration = 1 1 1 0 1 0 1 0 Nabieva et al. (2005)

  10. Functional Flow Iteration = 2 2 2 .33 2 .5 2 .33 Nabieva et al. (2005)

  11. Conditional Function Flow Extends Functional Flow by allowing flow between functions based on conditional probability 11

  12. Conditional Probabilities F1 F2 F3 F4 F1 P(F1|F1) P(F2|F2) F2 P(F3|F3) F3 P(F4|F4) F4 12

  13. Conditional Probabilities F1 F2 F3 F4 P(F1|F2) P(F1|F3) P(F1|F4) F1 P(F1|F1) P(F2|F1) P(F2|F2) P(F2|F3) P(F2|F4) F2 P(F3|F1) P(F3|F2) P(F3|F3) P(F3|F14) F3 P(F4|F1) P(F4|F2) P(F4|F3) P(F4|F4) F4 13

  14. MFO 1 CFF Fmax 0.50 Freq Transfer Fmax 0.40 Priors Fmax 0.37 Transfer Fmax 0.40 WeightFF Fmax 0.38 0.8 0.6 precision 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 recall 14

  15. BPO 1 CFF Fmax 0.44 Freq Transfer Fmax 0.42 Priors Fmax 0.36 Transfer Fmax 0.42 WeightFF Fmax 0.42 0.8 0.6 precision 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 recall 15

  16. CCO 1 CFF Fmax 0.57 Freq Transfer Fmax 0.56 Priors Fmax 0.57 Transfer Fmax 0.55 WeightFF Fmax 0.58 0.8 0.6 precision 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 16 recall

  17. Biologically Relevant Observed Pair Frequency Expected Frequency Term 1 Term 2 Pvalue cytoskeletal protein binding transition metal ion binding microtubule binding 0.008 0.001 6.96E-04 small conjugating protein ligase activity transcription cofactor activity protein binding transcription factor activity receptor signaling protein activity 0.015 0.003 3.11E-03 DNA binding 0.058 0.016 1.56E-02 heterocyclic compound binding 0.085 0.034 3.41E-02 molecular_function 0.101 0.017 1.68E-02 17

  18. Conclusion Conditional Functional Flow works for molecular function Does not work for higher level definitions of function Captures ontological bias Can be applied to other graph based data where guilt by association might not be the rule 18

  19. Overview Predicting Function from primary sequence protein interaction data Assigning prior probabilities to regions of the genome using recombination rates Somatic disease mutations Germline disease mutations 19

  20. Linked Selection and Human Disease How does recombination affect the distribution of SNP s throughout the genome? Where are disease mutations more likely to occur? Compare mutations to background population mutations Do somatic and germline mutations occur in the same areas of the genome? 20

  21. Why Recombination: Mullers Ratchet Asexual populations are doomed due to the irreversible accumulation of deleterious mutations Sexual reproduction and recombination counteract these forces

  22. Linked Selection Background Selection Genetic Hitchhiking 13 polymorphic sites 7 polymorphic sites 13 polymorphic sites 22

  23. Linked Selection Background Selection Genetic Hitchhiking 13 polymorphisms 7 polymorphic sites 0 polymorphic site 23

  24. Linked Selection Background Selection Genetic Hitchhiking 13 polymorphisms 7 polymorphic sites 0 polymorphic site 24

  25. Recombination Parent Gametes 25

  26. Previous Work 2.5 Pgenes Genes Transposable Elements 2 Background 1.5 cM/Mb cM/Mb 1 0.5 0 Human Worm Fly Zebrafish 26

  27. Recombination and Genetic Diversity Recombination results in higher rates nucleotide heterozygosity ( ) Average pairwise nucleotide differences between all individuals Recombination varies across a chromosome High in the chromosome arms Low in centromeres 27

  28. : Average Pairwise Differences 1 2 3 4 5 6 7 8 9 * * * * * S1 S2 S3 S4 S5 S6 Out G C G A A T - G C A G G T A T T G C G C G T A T T T T G C G T A T T T T G C G T A T T G C G C G T A T T G T G G G T A T T G T N 2 pairwise differences p = 32/15 = 2.133 Method 1: Sn n p = 1.2 1.7778 = 2.133 2pi(1- pi) Method 2: n-1 28 i=1

  29. : Average Pairwise Differences Measured using Average pairwise differences between individuals Sn n p = 2pi(1- pi) n-1 i=1 29

  30. : Average Pairwise Differences Measured using Average pairwise differences between individuals Sn p =1 2pi(1- pi) W i=1 W= window size 30

  31. and cM/Mb Rutgers Map v3 CEPH Pedigrees Linear interpolation between markers Smoothed with 100KB sliding window 1K Genomes Ignoring masked regions 31

  32. Recombination and Genetic Diversity Recombination results in higher rates nucleotide heterozygosity ( ) Average pairwise nucleotide differences between all individuals Recombination varies across a chromosome High in the chromosome arms Low in centromeres 32

  33. Variation in Recombination Chromosome 17 Centimorgan = average of 0.01 crossovers per generation 33

  34. and cM/Mb Chromosome 17 34

  35. Hypothesis Novel mutations arise at equal probability throughout the genome Deleterious mutations will rise to higher frequencies in areas of low recombination due to hitchhiking and being crossed out in areas where recombination is high Neutral and beneficial mutations will rise to higher frequencies in areas of high recombination because they arise on independent backgrounds 35

  36. Datasets Used Somatic Mutations Alexandrov et al. Signatures of mutational processes in human cancer, (2013) Novel to the individual Disease Mutations HGMD Exist in the population at some frequency 36

  37. 9.88E-04 1.00E-03 9.50E-04 8.83E-04 8.81E-04 8.72E-04 9.00E-04 8.76E-04 8.74E-04 8.74E-04 8.68E-04 8.62E-04 8.57E-04 8.50E-04 * 8.36E-04 8.00E-04 7.50E-04 7.27E-04 7.00E-04 * Not statistically significant (p <.01) bonferroni corrected 37

  38. 1.5 1.45 1.39 1.41 1.4 1.37 cM/Mb 1.34 1.35 1.31 1.30 1.3 1.29 1.27 1.25 1.26 1.25 1.25 1.24 1.2 38

  39. Alternative: Conservation? Chun S, Fay JC (2011) Evidence for Hitchhiking of Deleterious Mutations within the Human Genome. PLoS Genet 7(8): e1002240. doi:10.1371/journal.pgen.1002240 http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002240

  40. High Confidence SV Deletions 2.00 1.79 Background 1.75 cM/Mb cM/Mb 1.50 1.41 1.38 1.35 1.25 1.00 All Breakpoints TEI NAHR NHR 40

  41. Thank you! Mark Gerstein Sharon Qian Ian Gonzalez Shantao Li Lucas Lochovsky Yao Fu Alex Abyzov Ekta Khurana 41

  42. Publications W.T. Clark, P. Radivojac, Vector quantization kernels for the classification of protein sequences and structures. PSB 2014 Proceedings W.T. Clark, P. Radivojac, Information theoretic metrics for the evaluation of ontological annotations. ISMB 2013 Proceedings P. Radivojac, W.T. Clark, et al. A large-scale evaluation of computational protein function prediction. Nature Methods, (2013) 10(3): 221-227. Y. Zhao, W. T. Clark, M. Mort, D. Cooper, P. Radivojac, S. Mooney. Prediction of functional regulatory SNPs in monogenic and complex disease. Human Mutation 32(10):1183 1190, 2011. W. T. Clark, P. Radivojac. Analysis and prediction of protein function from amino acid sequence. Proteins: Structure, Function, and Bioinformatics, 79(7):2086 2096, 2011 . N. L. Nehrt, W. T. Clark*, P. Radivojac, M. W. Hahn. Testing the ortholog conjecture with comparative functional genomic data for mammals. PLoS Computational Biology, 7(6):e1002073, 2011. (* Co-first author) P. Radivojac, K. Peng, W. T. Clark, B. J. Peters, A. Mohan, S. M. Boyle, S. D. Mooney. An integrated approach to inferring gene- disease associations in humans. Proteins: Structure, Function, and Bioinformatics, 72(3):1030-1037, 2008. M. Dalkilic, J.C. Costello, W.T. Clark, P. Radivojac. From protein-disease associations to disease informatics. Frontiers in Bioscience, 13:3391-3407, 2008. M. M. Dalkilic, W. T. Clark, J. C. Costello, P. Radivojac. Using compression to detect classes of inauthentic texts. In Proceedings of the 2006 SIAM Conference on Data Mining, pages 604-608, April 2006.

More Related Content