
Solving Maximum Likelihood Problems in Phylogenetics
Explore the challenges and solutions in estimating model trees in phylogenetics using maximum likelihood methods, including NP-hardness, heuristic algorithms, and scoring techniques like Felsenstein's peeling algorithm.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
CS 581 Tandy Warnow
Topics Introducing maximum likelihood Variants: Scoring a fixed model tree (Felsenstein s peeling algorithm) Finding best numeric parameters for a fixed tree Finding the best model tree Statistical consistency? NP-hardness
CFN Maximum Likelihood Input: Same input as for maximum parsimony (alignment A) Output: CFN model tree T* (tree and numeric parameters) that maximizes Pr(A|T*) Note: can define maximum likelihood tree estimation for all the other models (e.g., Jukes- Cantor, GTR)
CFN Maximum Likelihood Input: Same input as for maximum parsimony (alignment A) Output: CFN model tree T* (tree and numeric parameters) that maximizes Pr(A|T*) Comments: NP-hard! Effective heuristics: RAxML, IQ-TREE, FastTree, etc. These heuristics operate by scoring a model tree, then searching through model tree space using local search strategies.
Scoring a model tree Input: Alignment A and model tree T* (tree and numeric parameters) Output: Pr(A|T*) Question: How do we do this? Answer: Felsenstein s peeling algorithm (dynamic programming, polynomial time) see textbook.
Approaches for solving ML 1. 2. Hill-climbing heuristics (which can get stuck in local optima) Randomized algorithms for getting out of local optima Note the need to compute score (here we show absolute value of the log likelihood, and seek to minimize this) Local optimum |log likelihood| Global optimum Model trees
Solving NP-hard problems exactly is unlikely #leaves 4 5 6 7 8 9 10 20 100 1000 #trees 3 15 105 945 10395 135135 2027025 2.2 x 1020 4.5 x 10190 2.7 x 102900 Number of (unrooted) binary trees on n leaves is (2n-5)!! If each tree on 1000 taxa could be analyzed in 0.001 seconds, we would find the best tree in 2890 millennia
Solving ML on a fixed tree topology Input: Alignment A and tree topology T (without numeric parameters) Output: numeric parameters for T that maximizes Pr(A|T*) where T* is T with numeric parameters Question: How do we do this? Is it solvable in polynomial time? Answer: I don t know. But heuristics use hill- climbing techniques that are not guaranteed to be accurate.
Statistical consistency of ML? Input: Alignment A Output: CFN model tree T* that maximizes Pr(A|T*), where T* is T with CFN numeric parameters Comments: If solved exactly, CFN ML tree estimation is statistically consistent for data generated under the CFN model. Similarly, Jukes-Cantor ML tree estimation is statistically consistent for data generated under the Jukes-Cantor model. But note that the model matters (if data generated under a more complex model, estimating under a simple model may not be consistent). Even if generative model and estimation model are same, may not be consistent! Look up No Common Mechanism Model
Bayesian estimation Similar to maximum likelihood (i.e., you know the model), but now you aren t trying to find the tree and numeric parameters to maximize probability. You just want to sample from the distribution somehow, and use those samples