
Quartet Placement in Phylogenetic Analysis
Explore the concept of quartet placement in phylogenetic analysis using the INSTRAL method, which aims to enhance the accuracy of species tree estimation by incorporating quartet scores. This approach involves adding query species to existing gene trees, optimizing species tree estimation, and addressing discordance issues. Dive into the algorithm modifications and goal-oriented strategies for effective quartet placement in the phylogenetic context.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
INSTRAL: Discordance-Aware Phylogenetic Placement Using Quartet Scores Maryam Rabiee, Siavash Mirarab
Species Tree Estimation Adding a species to an existing dataset with a species tree Start from scratch? Add directly to species tree? Phylogenetic placement
Recap: ASTRAL Input: set of gene trees with bipartition set X Output: species tree that maximizes quartet agreement with gene trees, with all bipartitions from X Statistically consistent under MSC ASTRAL-II: adds some bipartitions to X when species are missing from some gene trees ASTRAL-III: polynomial time, highly scalable
Goal: Quartet Placement Problem backbone species tree on n species k gene trees on n+m species add m (query) species to tree
Goal: Quartet Placement Problem m = 1 compute quartet score (QS) between each possible placement of query taxon on quartet trees QS: number of quartets induced by n+1 species that support that placement
Goal: Quartet Placement Problem m > 1: add sequences one after the other
Goal: Quartet Placement Problem OR m > 1: add each sequence to backbone and merge
INSTRAL Input Takes gene trees on n+m species (n backbone, m query), a species tree on n species, and the labels of the m query species Therefore: must have already added the query sequences to the gene trees de novo gene tree phylogenetic placement method (e.g., pplacer, APPLES, or EPA-ng)
INSTRAL (1 query) Modify ASTRAL algorithm Set bipartition set X to all bipartitions with & without new species q: new species; T: backbone tree on leafset L; B(T): all bipartitions of T X includes every possible placement of query ASTRAL guaranteed to find solution to quartet placement problem exactly with new search space
INSTRAL (Multiple queries) Independent placement Used when relationship between queries isn t important or they are expected to not be closely related Runtime increases linearly with number of queries Species tree not statistically consistent (polytomies) Ordered placement Runtime increases as polynomial in number of queries NOT guaranteed to find optimal quartet placement solution Species tree IS statistically consistent
Datasets Simulated Datasets: 200 ingroup species 50, 200, or 1000 (out of 1000) genes per replicate Varying levels of ILS FastTree-II used to infer gene trees from sequences Two inferred species trees (ASTRAL and FastTree-II) Biological Datasets: Insects: 144 species, 1478 genes, ASTRAL species tree Plants: 103 species, 424 genes, ASTRAL species tree Birds: 48 orders, full genomes, ASTRAL tree, high gene tree discordance
Experiments Leave-one-out: prune one taxon from true/estimated species tree Comparison to ASTRAL: Input gene trees are those inferred by FastTree (i.e., de novo) Backbone is pruned estimated species tree Replace taxon with INSTRAL Compare the new species tree with the original one inferred by ASTRAL Since INSTRAL finds optimal placement, its QS is at least as high as ASTRAL s Comparison to CA-ML (RAxML + EPA-ng): Input gene trees to INSTRAL are either de novo or estimated with EPA-ng to place query sequence Input to CA-ML is concatenated gene sequences Backbone is pruned true species tree Compare new species tree with one estimated by EPA-ng
Experiments Multiple (ordered) placement: prune a fraction of taxa from estimated species tree, order arbitrarily, and re-insert, updating backbone each time Comparison to ASTRAL Compare INSTRAL output species tree with estimated ASTRAL species tree Measure INSTRAL RF distance to true species tree vs. ASTRAL RF distance to true species tree INSTRAL QS does not have to be at least ASTRAL QS (not guaranteed optimal)
Comparison to ASTRAL Left: # cases where INSTRAL QS > ASTRAL QS (ASTRAL failed to find optimal placement) Middle: # cases where INSTRAL RF distance to true tree different from ASTRAL RF distance to true tree Right: # cases where INSTRAL RF distance to true tree < ASTRAL RF distance to true tree 50 genes 200 genes 1000 genes Moderate ILS 11; 8; 1 5; 3; 0 4; 4; 0 High ILS 41; 31; 13 12; 8; 5 5; 3; 2 Very high ILS 178; 140; 26 41; 33; 7 19; 12; 6
Comparison to CA-ML with EPA-ng EPA-ng finds the maximum likelihood placement of a sequence on a set of candidate branches
Multiple (Ordered) Placement Difference in RF distance between each method (ASTRAL INSTRAL) and true tree
Biological data Single placement INSTRAL found the same placement for the query sequence in the species tree as ASTRAL for all cases EPA-ng (on 1KP dataset) only found same placement for 68/103 species Ordered placement (remove of species)
Phylogenetic Placement on Degraded Multi-Gene Data Evaluating INSTRAL and EPA-ng on genome (multi-gene) data where some of the species have extremely degraded genes Must use EPA-ng to place degraded genes into each gene tree