
Human Genome Evolution: Studies on the Future of 6-Fingered Humanity
Recent natural selection studies suggest humans may evolve to have 6 fingers in the next million years. Discoveries on the rearrangement of the human genome, chromosome theories (22 or 24), and models of genomic architectures are discussed. The Fragile Breakage Model (FBM) challenges the Random Breakage Model (RBM), with recent studies supporting FBM. Controversy and support for FBM are explored, providing insights into the evolution of the human genome.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
recent natural selection studies demonstrated that humans will soon evolve to have 6 fingers How Will the Human Genome Rearrange in the Next Million Years? Max Alekseyev and Pavel Pevzner University of South Carolina and University of California at San Diego
recent natural selection studies demonstrated that humans will soon evolve to have 6 fingers But how many chromosomes? 22 or 24? How Will the Human Genome Rearrange in the Next Million Years? Max Alekseyev and Pavel Pevzner University of South Carolina and University of California at San Diego
Discovery of Surprisingly Complex Features of Mammalian Genomic Architectures Comparative analysis of the human chromosomes revealed many SHORT adjacent regions corresponding to parts of SEVERAL mouse chromosomes Mouse chromosomes Human chromosome
Random Breakage Model: Genomic architectures are shaped by rearrangements that occur randomly. The Random Breakage Model (RBM) was introduced by Susumu Ohno in 1970 and further developed by Nadeau and Taylor in 1980s RBM was embraced by biologists and has become de facto theory of chromosome evolution RBM implies that there are no rearrangement hotspots
Fragile Breakage Model: rearrangements are reusing the same rearrangement hotspots Pevzner & Tesler, PNAS 2003 every evolutionary scenario for transforming Mouse into Human genome must result in a large number of breakpoint re-uses, a contradiction to the RBM. Fragile Breakage Model (FBM) postulates existence of rearrangement hotspots (fragile regions) andvast breakpoint re-use
While FBM was a subject to controversy in 2003-2007, All Recent Studies Support FBM Kikuta et al., Genome Res. 2007: ... the Nadeau and Taylor hypothesis is not possible for the explanation of synteny in rat.
With One Exception Ma et al., Genome Res. 2006: this frequency of breakpoint re-use is approximately what one would expect if breakage was equally likely for every genomic position the RBM vs. FBM controversy] is beyond the scope of this study. ...a careful analysis [of
Reconciling FBM with We reconcile the evidence for limited breakpoint reuse in Ma et al., 2006 with the FBM and reveal a rampant but elusive breakpoint reuse. We provide evidence for the birth and death of the fragile regions, implying that they move to different locations in different lineages, explaining why Ma et al., 2006, found limited breakpoint reuse between different branches of the evolutionary tree. We introduce the Turnover Fragile Breakage Model (TFBM) that sheds light on a possible relationship between rearrangements and Matching Segmental Duplications. TFBM points to locations of the currently fragile regions in the human genome.
Striking Irregularities in Breakpoint Reuse Across Various Pairs of Branches 7 branches of the tree number of rearrangements 7 branches of the tree Breakpoint intra- and inter- reuses between 7 branches of the tree. Colors represent the distance between a pair of branches: red=adjacent, green=separated by one branch, yellow=separated by two branches). Mouse Rat Dog macaQue Human What is surprising about this Table?
Chromosome Evolution: Random or Fragile? Biologists accepted RBM because it complies with the exponential distribution of the sizes of the synteny blocks observed in real genomes (exponential distribution test). A flaw in this logic: RBM is not the only model that complies with the exponential distribution test. Pevzner and Tesler refuted RBM because RBM does not comply with the breakpoint reuse test: RBM implies low reuse but real genomes reveal high reuse. FBM complies with both the exponential distribution and breakpoint reuse tests. Exponential distribution Breakpoint reuse Test Model YES NO RBM FBM YES YES
Chromosome Evolution: Random, Fragile, or ??? Why biologists believe in RBM? Because RBM implies the exponential distribution of the sizes of the synteny blocks observed in real genomes. A flaw in this logic: RBM is not the only model that complies with the exponential distribution test. Why Pevzner and Tesler refuted RBM? Because RBM does not comply with the breakpoint reuse test: RBM implies low reuse but real genomes reveal high reuse. FBM complies with both the exponential distribution and breakpoint reuse tests. RBM and FBM fail the Multispecies Breakpoint Reuse (MBR) test. Exponential distribution Breakpoint reuse MBR Test Model YES NO NO RBM FBM YES YES NO
Chromosome Evolution: Turnover of Fragile Regions Why biologists believe in RBM? Because RBM implies the exponential distribution of the sizes of the synteny blocks observed in real genomes. A flaw in this logic: RBM is not the only model that complies with the exponential distribution test. Why Pevzner and Tesler refuted RBM? Because RBM does not comply with the breakpoint reuse test: RBM implies low reuse but real genomes reveal high reuse. FBM complies with both the exponential distribution and breakpoint reuse tests. TFBM passes all three tests. Exponential distribution Breakpoint reuse MBR Test Model YES NO NO RBM FBM TFBM YES YES NO YES YES YES
Compressing 20 Years of Algorithmic Research on Genome Rearrangements into 3 Minutes
Multichromosomal Genomes: Genomic Distance Genomic Distance is the minimum number of reversals, translocations, fusions, and fissions required to transform one genome into the other. Reversals and translocations make 2 breaks in the genome (e.g., on chromosomes 4 and 20) and glue the resulting fragments in a new order. To study rearrangements, we use the notion of 2-break rearrangements, also known as DCJ (Yancopoulus et al., Bioinformatics 2005).
Lets Imagine that Mammals Have Circular Chromosmes A chromosome can be represented as a cycle with red and black edges, where: b c a d red edges encode synteny blocks and their directions; Circular genome black edges connect adjacent synteny blocks a, -b, -c, d with 4 synteny blocks
Transforming a Genome a,-b,-c,d into a Genome a,-b,-d,c b b 2-break c a c a a d d 2-Break replaces any pair of black edges with another pair forming matching on the same 4 vertices. Reversals (aka inversions), translocations, fusions, and fissions represent all possible types of 2-breaks.
Black and Green Genomes: Black-Red and Green-Red Cycles b P c a d c b Q a d
Drawing the Red Genome P in the same way as the Green Genome Q is drawn b P c a c d b a c d b Q a d
Breakpoint Graph = Superposition of Genomes b Breakpoint Graph (Bafna & Pevzner, FOCS 1994) P c a c G(P,Q) d b a c d b Q a d
Black-Green Cycles Black and green edges form a collection of black-green alternating cycles (where the colors of edges alternate). c b a The number of black-green cycles cycles(P,Q) in the breakpoint graph G(P,Q) plays a central role in analyzing genome rearrangements. d
Rearrangements Change Black-Green Cycles Transforming black genome P into green genome Q by 2- breaks corresponds to transforming the breakpoint graph G(P,Q) into the breakpoint graph G(Q,Q). c c c G(P,Q) G(Q,Q) b b b a a a G(P',Q) trivial cycles d d d cycles(P,Q) = 2 cycles(P',Q) = 3 cycles(Q,Q) = 4 = blocks(P,Q)
Finally, a Theorem The 2-Break distance dist(P,Q)between genomes P and Q is the minimum number of 2-breaks required to transform P into Q. The 2-Break Distance between genomes PandQ: dist(P,Q) = blocks(P,Q) - cycles(P,Q)
Breakpoints Are Vertices in Nontrivial Cycles Breakpoints correspond to regions in the genome that were broken by some rearrangement(s). c In the breakpoint graph, breakpoints correspond to vertices having two neighbors (shown as large blue circles) b a d All vertices in non-trivial cycles in the breakpoint graph represent breakpoints.
Breakpoint Uses and Reuses Each 2-break uses four vertices (the endpoints of the affected edges). A vertex (breakpoint) is reused if it is used by at least two different 2-breaks. Number of uses: c c 0 c 2 1 0 a b b b a a 1 1 1 2 d d d
Intra- and Inter- Reuses For an evolutionary tree with a known rearrangement scenario, a breakpoint is intra-reused on some branch if it is used by at least two different 2-breaks along this branch. Similarly, a breakpoint is inter-reused across two branches if it is used on both these branches.
Reconstructing Ancestral Genomes and Rearrangement Scenarios There exists a variety of tools for reliable reconstruction of ancestral genomes: GRAPPA: Moret et al., PSB 2001 MGR: Bourque and Pevzner, Genome Res. 2002 InferCARs: Ma et al., Genome Res. 2006 EMRAE: Zhao and Bourque, Genome Res. 2007 MGRA: Alekseyev and Pevzner, Genome Res. 2009 but the rearrangement scenarios on edges of the evolutionary tree remain ambiguous. Challenge: compute the number of intra- and inter- reuses without knowing rearrangement scenarios
Number of Intra-Reuses For a rearrangement scenario between genomesP and Q: The number of 2-breaks is at least dist(P,Q) Each 2-break uses 4 breakpoints The number of breakpoints is 2 blocks(P,Q) Hence the total number of intra-reuses is: 4 dist(P,Q) 2 blocks(P,Q)
Number of Inter-Reuses For two branches (P,Q) and (P',Q') in the tree: Breaks(P,Q): breakpoints between genomes P and Q (vertices in non-trivial cycles in G(P,Q)) Breaks(P Q ): breakpoints between genomes P' and Q' (vertices in non-trivial cycles in G(P',Q')) Hence, the number of inter-reuses is size of the intersection of Breaks(P,Q) and Breaks(P ,Q )
Surprising Irregularities in Breakpoint Reuse Across Various Pairs of Branches 7 branches of the tree 7 branches of the tree Breakpoint intra- and inter- reuses between 7 branches of the tree. Colors represent the distance between a pair of branches: red=adjacent, green=separated by one branch, yellow=separated by two branches). Mouse Rat Dog macaQue Human What is surprising about this Table?
Surprising Irregularities in Breakpoint Reuse Across Various Pairs of Branches Reuse(M+,R+) roughly equal to Mouse Rat Dog macaQue Human Reuse(M+,QH+) Red, Blue, and Black branches from M, R, and QH all have roughly the same length (56, 68, and 58). roughly equal to Reuse(R+,QH+) According to RBM and FBM, the breakpoint reuse among a pair of branches is proportional to the product of branch lengths:
Extreme bias in breakpoint reuse between (M+,R+), (M+,QH+), (R+,QH+) pairs of edges Reuse(M+,R+)=68 is 4 times larger than Mouse Rat Dog macaQue Human Reuse(M+,QH+)=15 Bold branches from M, R, and QH all have roughly the same length (56, 68, and 58). and According to RBM and FBM, the breakpoint reuse among a pair of branches is proportional to the product of branch lengths: Reuse(R+,QH+)=17
Evolutionary Close Branches: High Breakpoint Reuse Evolutionary Distant Branches: Low Breakpoint Reuse Mouse Rat Dog macaQue Human Breakpoint reuse on all pairs of ADJACENT branches (e.g, Mouse and Rat) is HIGH Breakpoint reuse on all pairs of DISTANT branches (e.g. Mouse and Dog) is LOW Breakpoint reuse on all pairs of VERY DISTANT branches (e.g. Mouse and Human) is VERY LOW
Turnover Fragile Breakage Model (TFBM) The Ma et al. observation and the surprising irregularities of inter-reuses suggest: Breakpoint inter-reuses mostly happen across adjacent branches of the evolutionary tree. Turnover Fragile Breakage Model (TFBM): Fragile regions are subject to a birth and death process and thus have limited lifespan.
Simplest TFBM: Fixed Turnover Rate for Fragile Regions TFBM(m,n,x): genomes have m fragile regions n (out of m) fragile regions are active each 2-break is applied to active fragile regions after each 2-break, xactive fragile regions die and x new active fragile regions are born 6 born, 6 died: x=6
Distinguishing TFBM from FBM Multispecies Breakpoint Reuse Test
Detecting the Birth and Death Given an evolutionary tree with a known rearrangement scenario, how one would determine whether it is characterized by the birth and death of fragile regions? TFBM with x = 0(that is, FBM/RBM) or x > 0 ?
Detecting the Birth and Death Given an evolutionary tree with an unknown rearrangement scenario, how one would determine whether it is characterized by the birth and death of fragile regions? TFBM with x = 0(that is, FBM/RBM) or x > 0 ?
Tracking Reuse Across the Evolutionary Tree TFBM suggests that on average the number breakpoint reuses br(r1,r2) for 2-breaks r1 and r2 depends on the distance (in the evolutionary tree) between them. The larger is the distance, the smaller is br(r1,r2). Goal: define a single measure for the whole tree that describes this trend and tests whether x>0.
Multispecies Breakpoint Reuse L-separated 2-breaks: a pair of 2-breaks r1 and r2 separated by distance L in the evolutionary tree The multispecies breakpoint reuse is a function R(L) expressing averaged breakpoint reuse between L- separated 2-breaks R(L) = br(r1,r2) / #L-separated 2-breaks sum is over all L-separated 2-breaks r1 and r2
Multispecies Breakpoint Reuse Test For RBM/FBM, R(L) is a constant. For TFBM with x > 0, R(L) is a decreasing function. MBR Test: compute R(L), and check if it is decreasing. For TFBM with parameters m, n, x, we derive an analytic formula: R(L) = 8(m-n)/(mn) * ( 1 xm/(n(m-n)) )L + 8/m
From Simulated to Real Genomes: Complications with Computing MBR It is easy to compute R(L) for simulated genomes, whose rearrangement history is defined by simulations. For real genomes, while we can reconstruct the ancestral genomes, the evolutionary scenarios between them is unknown.
From Simulated to Real Genomes: Complications It is easy to compute R(L) for simulated genomes, whose rearrangement history is defined by simulations. For real genomes, while we can reliably reconstruct the ancestral genomes, the exact evolutionary scenarios between them remain ambiguous. We can sample rearrangement scenarios instead.
Multispecies Reuse between Mammalian Genomes Best fit: m 4017 n 196 x 1.12
Implication of TFBM Where Are the (Currently) Fragile Regions in the Human genome?
Prediction Power of TFBM Can we determine the currently active fragile regions in the human genome H from comparison with other mammalian genomes? RBM provides no clue FBM suggests to consider the breakpoints between H and any other genome TFBM suggests to consider the closest genome such as the macaQue-human ancestor QH. Breakpoints in G(QH,H) are likely to be reused in the future rearrangements of H.
Predicting Fragile Regions Using Reverse Time Machine Prediction of fragile regions on the branch from QH (macaque- human ancestor) to H (human)
Predicting Fragile Regions Using Reverse Time Machine Prediction of fragile regions on the branch from QH (macaque- human ancestor) to H (human)
Predicting Fragile Regions Using Reverse Time Machine Prediction of fragile regions on the branch from QH (macaque- human ancestor) to H (human) Breaks on the macaque branch are 3 times more reliable predictors of breaks on human branch than breaks on other branches