
Gene Tree Modeling: Method Choices and Accuracy
Explore the comparison between full modeling and summarizing gene trees in the context of tree uncertainty, method choices, and species accuracy. The study discusses techniques such as jointly estimating gene and species trees, consensus gene tree estimation, and ML gene tree estimation. Additionally, two representative software examples, STEM and *BEAST, are presented for Bayesian inference and maximum likelihood estimation in multi-locus nucleotide data analysis. Dive into the intricacies of gene tree modeling and tree accuracy assessment to enhance your understanding of evolutionary relationships.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Full modeling versus summarizing gene Full modeling versus summarizing gene- -tree uncertainty: Method choice and species uncertainty: Method choice and species- -tree accuracy accuracy tree tree L.L. Knowles et al., Molecular Phylogenetics and Evolution 65 (2012): 501-509
Full modeling versus summarizing gene Full modeling versus summarizing gene- -tree uncertainty: uncertainty: Method choice and species Method choice and species- -tree accuracy accuracy tree tree
Locus 1 Locus 3 TCACATTCC A..GT...G A...TA..G ...G.A..G taxon 1 ATGCCACTT taxon 2 ...GC.A.. taxon 3 .A.G..A.G taxon 4 .A....A.G ... (c) Jointly estimate gene trees and species tree Locus 1 (b) Estimate consensus gene trees Locus 1 (a) Estimate ML gene trees Locus 1 Locus 3 Locus 3 Locus 3 probability posterior probability posterior ... ... ... ML gene trees alternate topologies alternate topologies alternate topologies alternate topologies probability posterior ... alternate species tree topologies consensus gene trees estimate consensus species tree estimate ML species tree
Two representative software examples Two representative software examples STEM *BEAST Bayesian inference using full coalescent model Maximum likelihood based estimation Reads multi-locus nucleotide data Needs known gene trees Technically one of the fastest Bayesian approaches, but still quite costly in computational terms Less computationally intensive
Two representative software examples Two representative software examples STEM *BEAST Bayesian inference using full coalescent model Maximum likelihood based estimation Reads multi-locus nucleotide data Needs known gene trees Technically one of the fastest Bayesian approaches, but still quite costly in computational terms Less computationally intensive
Two representative software examples Two representative software examples ML-GT STEM *BEAST Bayesian inference using full coalescent model Maximum likelihood based estimation Reads multi-locus nucleotide data ML- Gene trees computed using GARLI Technically one of the fastest Bayesian approaches, but still quite costly in computational terms Less computationally intensive
Two representative software examples Two representative software examples ML-GT STEM Consensus-GT STEM *BEAST Maximum likelihood based estimation Maximum likelihood based estimation Bayesian inference using full coalescent model Reads multi-locus nucleotide data ML- Gene trees computed using GARLI Consensus gene tree computed using MrBAYES Less computationally intensive Less computationally intensive, although MrBAYES is slower than GARLI Technically one of the fastest Bayesian approaches, but still quite costly in computational terms
100 percent accuracy 50 0 1N 10N ML-GT STEM consensus-GT STEM *BEAST Species tree accuracy using the three methods on datasets simulating evolutionary durations of 1N and 10N generations respectively
The authors' conclusion The authors' conclusion 100 percent accuracy 50 0 1N 10N ML-GT STEM consensus-GT STEM *BEAST
The authors' conclusion The authors' conclusion 100 The factor having the largest effect on the accuracy of a species-tree estimate is not the method of analysis or sampling design, but is the timing of divergence (sic) percent accuracy 50 0 1N 10N ML-GT STEM consensus-GT STEM *BEAST
ML-GT STEM consensus-GT STEM *BEAST (A) 10N 12 Robinson Foulds 8 4 0 1:3 3:1 1:9 3:3 9:1 3:9 9:3 1N 12 Robinson Foulds 8 4 0 1:3 3:1 1:9 sampling effort (individuals:loci) 3:3 9:1 3:9 9:3
2.0 10N ML-GT STEM consensus-GT STEM *BEAST Kuhner Felsenstein 1.0 0.0 1:3 3:1 1:9 3:3 9:1 3:9 9:3 (B) 1N 2.0 Kuhner Felsenstein 1.0 0.0 1:3 3:1 1:9 sampling effort (individuals:loci) 3:3 9:1 3:9 9:3
Conclusion Conclusion On small sample sizes, all methods yield similarly (in)accurate species trees Hence there is no justification for using computationally intensive approaches in these situations.
Conclusion Conclusion Similarly, all methods yield similarly accurate trees independent of sample size when the analyzed data has evolved down a substantially deep tree. Therefore the less intensive methods would be preferable.
Conclusion Conclusion When analyzing larger sample sizes containing recent speciation, there is a significant difference in species tree accuracy among the methods. Full coalescent model based inference methods (*BEAST for example) appear to perform best in these situations. In fact, the result on shorter trees rival those on deeper ones in this specific scenario.
In Short: In Short: Smaller Sample Size Larger Sample Size Consensus GT STEM Shorter Tree * BEAST Deeper Tree ML-GT STEM ML-GT STEM
Questions Questions 1. How did running time compare for the two methods? Did the authors make any effort to adjust the degree of sampling time for one or the other? 2. Which one would you use for an analysis like this in the future based on what you ve read? 3. How would you have improved the authors simulated dataset? 4. Why do you/the authors think sampling scheme has the opposite effect on species tree accuracy for late diverging versus recently diverging data? 5. A primary motivation of the paper is the way that gene tree estimation error is treated by the different methods. Did the time of divergence affect the amount of gene tree estimation error in either the maximum likelihood gene trees or the consensus gene trees? 6. What do Knowles et al mean by mutational variance? How is this different than coalescent variance? 7. What is the effect of incomplete gene trees on species tree estimation ? 8. Why did this paper not use summary methods in their analysis ?
Other Questions ? Other Questions ?