
Protein Function Prediction and Analysis in Biological Systems
Explore the world of protein function prediction through interactions and data analysis. From gene ontology to guilt by association, delve into the intricate network of biological processes and molecular functions. Discover how conditional probabilities enhance functional flow in protein networks.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Farewell Talk Wyatt T. Clark 1
Overview Predicting Function from protein interaction data Assigning prior probabilities to regions of the genome using recombination rates Somatic disease mutations Germline disease mutations 2
Gene Ontology Standardizes Function Vocabulary Biological Process (BPO) Molecular Function (MFO) Cellular Component (CCO) 3
Protein Protein Interactions (PPI) Set of Vertices and Edges, Function Annotations p2 Vertices are proteins p3 p1 Unordered edges p4 Functions denoted as set membership
Guilt by Association {F7,F8, F10} {F7, F10} {F7, F10} {F7,F11} {F7} {F7,F9, F10} {F7, F10} Schwikowski et al. (2000)
Functional Flow Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , Nabieva et al. (2005) Diffuses function annotations through protein- protein interaction network 6
Functional Flow Iteration = 0 ? ? 0 0 ? 0 F1 ? ? 0 0 ? ? 0 0 F1 Nabieva et al. (2005) 7
Functional Flow Iteration = 1 0 0 0 0 0 0 0 Nabieva et al. (2005)
Functional Flow Iteration = 1 1 1 0 1 0 1 0 Nabieva et al. (2005)
Functional Flow Iteration = 2 2 2 .33 2 .5 2 .33 Nabieva et al. (2005)
Conditional Function Flow Extends Functional Flow by allowing flow between functions based on conditional probability 11
Conditional Probabilities F1 F2 F3 F4 F1 P(F1|F1) P(F2|F2) F2 P(F3|F3) F3 P(F4|F4) F4 12
Conditional Probabilities F1 F2 F3 F4 P(F1|F2) P(F1|F3) P(F1|F4) F1 P(F1|F1) P(F2|F1) P(F2|F2) P(F2|F3) P(F2|F4) F2 P(F3|F1) P(F3|F2) P(F3|F3) P(F3|F14) F3 P(F4|F1) P(F4|F2) P(F4|F3) P(F4|F4) F4 13
MFO 1 CFF Fmax 0.50 Freq Transfer Fmax 0.40 Priors Fmax 0.37 Transfer Fmax 0.40 WeightFF Fmax 0.38 0.8 0.6 precision 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 recall 14
BPO 1 CFF Fmax 0.44 Freq Transfer Fmax 0.42 Priors Fmax 0.36 Transfer Fmax 0.42 WeightFF Fmax 0.42 0.8 0.6 precision 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 recall 15
CCO 1 CFF Fmax 0.57 Freq Transfer Fmax 0.56 Priors Fmax 0.57 Transfer Fmax 0.55 WeightFF Fmax 0.58 0.8 0.6 precision 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 16 recall
Biologically Relevant Observed Pair Frequency Expected Frequency Term 1 Term 2 Pvalue cytoskeletal protein binding transition metal ion binding microtubule binding 0.008 0.001 6.96E-04 small conjugating protein ligase activity transcription cofactor activity protein binding transcription factor activity receptor signaling protein activity 0.015 0.003 3.11E-03 DNA binding 0.058 0.016 1.56E-02 heterocyclic compound binding 0.085 0.034 3.41E-02 molecular_function 0.101 0.017 1.68E-02 17
Conclusion Conditional Functional Flow works for molecular function Does not work for higher level definitions of function Captures ontological bias Can be applied to other graph based data where guilt by association might not be the rule 18
Overview Predicting Function from primary sequence protein interaction data Assigning prior probabilities to regions of the genome using recombination rates Somatic disease mutations Germline disease mutations 19
Linked Selection and Human Disease How does recombination affect the distribution of SNP s throughout the genome? Where are disease mutations more likely to occur? Compare mutations to background population mutations Do somatic and germline mutations occur in the same areas of the genome? 20
Why Recombination: Mullers Ratchet Asexual populations are doomed due to the irreversible accumulation of deleterious mutations Sexual reproduction and recombination counteract these forces
Linked Selection Background Selection Genetic Hitchhiking 13 polymorphic sites 7 polymorphic sites 13 polymorphic sites 22
Linked Selection Background Selection Genetic Hitchhiking 13 polymorphisms 7 polymorphic sites 0 polymorphic site 23
Linked Selection Background Selection Genetic Hitchhiking 13 polymorphisms 7 polymorphic sites 0 polymorphic site 24
Recombination Parent Gametes 25
Previous Work 2.5 Pgenes Genes Transposable Elements 2 Background 1.5 cM/Mb cM/Mb 1 0.5 0 Human Worm Fly Zebrafish 26
Recombination and Genetic Diversity Recombination results in higher rates nucleotide heterozygosity ( ) Average pairwise nucleotide differences between all individuals Recombination varies across a chromosome High in the chromosome arms Low in centromeres 27
: Average Pairwise Differences 1 2 3 4 5 6 7 8 9 * * * * * S1 S2 S3 S4 S5 S6 Out G C G A A T - G C A G G T A T T G C G C G T A T T T T G C G T A T T T T G C G T A T T G C G C G T A T T G T G G G T A T T G T N 2 pairwise differences p = 32/15 = 2.133 Method 1: Sn n p = 1.2 1.7778 = 2.133 2pi(1- pi) Method 2: n-1 28 i=1
: Average Pairwise Differences Measured using Average pairwise differences between individuals Sn n p = 2pi(1- pi) n-1 i=1 29
: Average Pairwise Differences Measured using Average pairwise differences between individuals Sn p =1 2pi(1- pi) W i=1 W= window size 30
and cM/Mb Rutgers Map v3 CEPH Pedigrees Linear interpolation between markers Smoothed with 100KB sliding window 1K Genomes Ignoring masked regions 31
Recombination and Genetic Diversity Recombination results in higher rates nucleotide heterozygosity ( ) Average pairwise nucleotide differences between all individuals Recombination varies across a chromosome High in the chromosome arms Low in centromeres 32
Variation in Recombination Chromosome 17 Centimorgan = average of 0.01 crossovers per generation 33
and cM/Mb Chromosome 17 34
Hypothesis Novel mutations arise at equal probability throughout the genome Deleterious mutations will rise to higher frequencies in areas of low recombination due to hitchhiking and being crossed out in areas where recombination is high Neutral and beneficial mutations will rise to higher frequencies in areas of high recombination because they arise on independent backgrounds 35
Datasets Used Somatic Mutations Alexandrov et al. Signatures of mutational processes in human cancer, (2013) Novel to the individual Disease Mutations HGMD Exist in the population at some frequency 36
9.88E-04 1.00E-03 9.50E-04 8.83E-04 8.81E-04 8.72E-04 9.00E-04 8.76E-04 8.74E-04 8.74E-04 8.68E-04 8.62E-04 8.57E-04 8.50E-04 * 8.36E-04 8.00E-04 7.50E-04 7.27E-04 7.00E-04 * Not statistically significant (p <.01) bonferroni corrected 37
1.5 1.45 1.39 1.41 1.4 1.37 cM/Mb 1.34 1.35 1.31 1.30 1.3 1.29 1.27 1.25 1.26 1.25 1.25 1.24 1.2 38
Alternative: Conservation? Chun S, Fay JC (2011) Evidence for Hitchhiking of Deleterious Mutations within the Human Genome. PLoS Genet 7(8): e1002240. doi:10.1371/journal.pgen.1002240 http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002240
High Confidence SV Deletions 2.00 1.79 Background 1.75 cM/Mb cM/Mb 1.50 1.41 1.38 1.35 1.25 1.00 All Breakpoints TEI NAHR NHR 40
Thank you! Mark Gerstein Sharon Qian Ian Gonzalez Shantao Li Lucas Lochovsky Yao Fu Alex Abyzov Ekta Khurana 41
Publications W.T. Clark, P. Radivojac, Vector quantization kernels for the classification of protein sequences and structures. PSB 2014 Proceedings W.T. Clark, P. Radivojac, Information theoretic metrics for the evaluation of ontological annotations. ISMB 2013 Proceedings P. Radivojac, W.T. Clark, et al. A large-scale evaluation of computational protein function prediction. Nature Methods, (2013) 10(3): 221-227. Y. Zhao, W. T. Clark, M. Mort, D. Cooper, P. Radivojac, S. Mooney. Prediction of functional regulatory SNPs in monogenic and complex disease. Human Mutation 32(10):1183 1190, 2011. W. T. Clark, P. Radivojac. Analysis and prediction of protein function from amino acid sequence. Proteins: Structure, Function, and Bioinformatics, 79(7):2086 2096, 2011 . N. L. Nehrt, W. T. Clark*, P. Radivojac, M. W. Hahn. Testing the ortholog conjecture with comparative functional genomic data for mammals. PLoS Computational Biology, 7(6):e1002073, 2011. (* Co-first author) P. Radivojac, K. Peng, W. T. Clark, B. J. Peters, A. Mohan, S. M. Boyle, S. D. Mooney. An integrated approach to inferring gene- disease associations in humans. Proteins: Structure, Function, and Bioinformatics, 72(3):1030-1037, 2008. M. Dalkilic, J.C. Costello, W.T. Clark, P. Radivojac. From protein-disease associations to disease informatics. Frontiers in Bioscience, 13:3391-3407, 2008. M. M. Dalkilic, W. T. Clark, J. C. Costello, P. Radivojac. Using compression to detect classes of inauthentic texts. In Proceedings of the 2006 SIAM Conference on Data Mining, pages 604-608, April 2006.