
Phylogenetic Inference and DNA Evolution: Understanding the Tree of Life
Discover the fascinating world of phylogenetics, evolutionary trees, and DNA sequence evolution through the lens of mathematical models and statistical inference methods. Explore how phylogenetic analysis helps unravel the relationships between species over millions of years, guided by experts like Tandy Warnow. Delve into the complexities of Markov models in sequence evolution and understand the statistical consistency of methods under different evolutionary models.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Why I love Why I love phylogenetics! phylogenetics! Tandy Warnow The University of Illinois
Phylogeny (evolutionary tree) Orangutan Human Gorilla Chimpanzee From the Tree of the Life Website, University of Arizona
Phylogenetic Inference Big Data : Heterogeneous Large Noisy Error-ridden Streaming Model-misspecification Approaches: NP-hard optimization problems and large datasets Statistical estimation under stochastic models of evolution Probabilistic analysis of algorithms Graph-theoretic divide-and-conquer Chordal graph theory Combinatorial optimization
DNA Sequence Evolution (Idealized) -3 mil yrs AAGACTT AAGACTT -2 mil yrs AAGGCCT AAGGCCT AAGGCCT AAGGCCT TGGACTT TGGACTT TGGACTT TGGACTT -1 mil yrs AGGGCAT AGGGCAT AGGGCAT TAGCCCT TAGCCCT TAGCCCT AGCACTT AGCACTT AGCACTT today AGGGCAT AGGGCAT TAGCCCA TAGCCCA TAGACTT TAGACTT AGCACAA AGCACAA AGCGCTT AGCGCTT
Phylogeny Problem U V W X Y AGGGCAT TAGCCCA TAGACTT TGCACAA TGCGCTT X U Y V W
Markov Models of Sequence Evolution (Gene Tree) The different sites are assumed to evolve i.i.d. down the model tree, so it suffices to model a single site Jukes-Cantor, 1969 (simplest DNA site evolution model): The state at the root is randomly drawn from {A,C,T,G} (nucleotides) The model tree T is binary and has substitution probabilities p(e) on each edge e, with 0<p(e)<3/4 If a site (position) changes on an edge, it changes with equal probability to each of the remaining states The evolutionary process is Markovian. More complex models are also considered, often with little change to the theory.
Is method M statistically consistent under model G? Question answered by mathematical proof Error in species tree inferred by method M Amount of data generated under model G and then given to method M as input
Neighbor Joining vs Maximum Parsimony Neighbor joining (distance-based) is polynomial time and is proven statistically consistent Maximum parsimony (Hamming Distance Steiner Tree Problem) is NP-hard and statistically inconsistent Therefore, we should use Neighbor Joining right?
Methods: NJ: Neighbor Joining (polytime and consistent) DCM-NJ+MP: divide-and-conquer Weighbour: polytime and consistent MP: Maximum parsimony (NP-hard, inconsistent) Y-axis: sequence length Y-axis: Tree error rate (fraction of edges missing) Note: NJ has higher error than MP on these data So, predictions from theory do not seem to work! Figure from Nakhleh et al., Pacific Symposium Biocomputing, 2002.
Why I love phylogenetics Phylogeny estimation is a non-trivial and complex statistical estimation problem Theory and empirical evaluation are both needed and they inform each other. These insights lead to advances in methods, which in turn enable biologists to make more accurate scientific discoveries.