
Algorithm Comparison: Gotoh vs. Hirschberg
Explore the differences between Gotoh's algorithm and Hirschberg's algorithm in sequence alignment, focusing on space complexity, similarity scoring, and conversion cost. Discover their implementations and see examples of their application in bioinformatics.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Outline Introduction Gotoh's algorithm O(N) space Gotoh's algorithm Main algorithm Implementation Conclusion
Introduction Space, not time Hirschberg s Algorithm Maximizing the similarity score of an alignment Gotoh s Algorithm Minimizing the difference score of a conversion Linear space version for affine gap penalties. For a megabyte of memory. W.Myers and Miller : sequences of length 62500 Altschul and Erickson : sequences length < 1070
Transformation (1/2) Gotoh sAlgorithm Hirschberg s Algorithm Aligned Pair ? ?,? ? ?,? = ???? ? ?,? ???(?) = ? + ? Affine Gap Penalties ? = ? ???(?) = ? + ?? 1 2???? ? =
Transformation (2/2) Match = 8, Mismatch = -5, Gap Symbol = -3, Gap-open = -4 < ( 6 < 5) (Hirschberg) 2? < ? ?,? => 2? < ???? ? ?,? => ???? 2? > ? ?,? 2 = ???? 2? 1 2???? ? =
Example(1/2) Hirschberg s Algorithm Gotoh s Algorithm Match 8 0 ? ?,? = ???? ? ?,? Mismatch -5 13 Gap-open -4 4 ? = ? 1 2???? ? Gap Symbol -3 7 =
Example(2/2) Gotoh sAlgorithm Hirschberg s Algorithm C Cost (minimum) 1 A : ACGGTTCAAG B : ACGGTTCAAG Hirschberg : 10x8 = 80 C = 0 (conversion cost) 1 210 + 10 8 0 = 80 (Alignment score) Hirschberg : 9x8 + (-5) = 67 C = 13 1 2 A : ACGGTTCAAG B : ACGGATCAAG 210 + 10 8 13 = 67 Hirschberg : 9x8 + (-4-3) = 65 C = 11 (7+4) 1 3 A : ACGGCTCAAG B : ACGGCTC AG 210 + 9 8 11 = 65
Some notations : the i-symbol prefix of A : the j-symbol prefix of B C(i, j):minimum cost of a conversion of to a b a b 1 i a j b i A j B 1 ........ 2 ........ 2 j B i A
Simple gap(1/4) gap(k)= h*k + ( , 1 j ) ( , b ) C i j w a i = + ) 1 , ( j i C ) min , ( i C ) 1 j ( , ) C i w j a + ( , 1 ( , ) w b i j
Simple gap(2/4) A A G 0.0 2.5 3.0 3.5 A G T A C 2.5 0.0 2.5 3.0 Space= O(n^2) 3.0 2.5 1.0 2.5 3.5 3.0 3.5 2.0 4.0 3.5 3.0 4.5 4.5 4.0 4.5 4.0
Simple gap(3/4) m/2
Simple gap(4/4) Forward score and backward score Space: O(m+n)
Affine gap(1/8) A gap of length k : cost = g + k*h A - - - T A A C T C G A A T C - - T
Affine gap(2/8) C(i, j):minimum cost of a conversion of to D(i, j):minimum cost of a conversion of to that deletes I(i, j):minimum cost of a conversion of to that inserts j B i A i a j B i A j b i A j B
Affine gap(3/8) = , ( j i C ) + if i > 0 and j> 0 if i = 0 and j> 0 if i > 0 and j= 0 if i = 0 and j= 0 min{ gap , ( i ), , ( i I ), ( , 1 ) 1 ( , )} D j j C i j w a b i j ( ) j ( ) gap i 0
Affine gap(4/8) = , ( j i D ) g + + if i > 0 and j> 0 if i = 0 and j> 0 min{ ( , 1 ), ( , 1 ) } D i j C i j g h + , 0 ( ) C j
Affine gap(5/8) = , ( j i I ) + + if i > 0 and j> 0 if i > 0 and j= 0 min{ , ( 1 ), , ( ) 1 } I i j C i j g h + ) 0 , ( C i g
D A A G Affine gap(7/8) A A G * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 C * 2.5 5.0 5.5 0.0 2.5 3.0 3.5 * 3.0 3.5 5.0 * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I 3.5 3.0 3.5 2.0 A A G 4.0 3.5 3.0 4.5 * * * * A G T A C 4.5 4.0 4.5 4.0 4.5 5.0 2.5 3.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0
D A A G Affine gap(8/8) A A G C * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 5.5 0.0 2.5 3.0 3.5 * 3.0 3.5 5.0 * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I 3.5 3.0 3.5 2.0 A A G 4.0 3.5 3.0 4.5 * * * * A G T A C 4.5 4.0 4.5 4.0 4.5 5.0 2.5 3.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0
Observation i-th row of C and D depends only on row i and i-1. i-th row of I depends only on row i. D I C
Linear Space Use two one-dimension arrays (CC and DD) and three variables.
Linear Space CC k = C i,k ,if k < j C i 1,k ,if k j DD k = D i,k ,if k < j D i 1,k ,if k j e = I(i , j-1) c = C(i , j-1) s = C(i-1 , j-1)
D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C DD * 5.0 5.5 6.0 * 2.5 5.0 5.5 t = 2.0 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 CC * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I A A G 3.5 3.0 3.5 2.0 * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0
D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C DD * 5.0 5.5 6.0 * 2.5 5.0 5.5 t = 2.0 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 CC * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I A A G 3.5 3.0 3.5 2.0 * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0
D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 DD 5.5 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C t = 4.5 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 i = 5 3.0 2.5 1.0 2.5 s I A A G CC 3.5 3.0 3.5 2.0 * * * * A G T A C e 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 c 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0
D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 DD 5.5 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C t = 4.5 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 i = 5 3.0 2.5 1.0 2.5 s I A A G CC 3.5 3.0 3.5 2.0 j = 1 * * * * A G T A C e 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 c 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0
D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 DD 5.5 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C t = 4.5 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 i = 5 3.0 2.5 1.0 2.5 s I A A G CC 3.5 3.0 3.5 2.0 j = 1 * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 c 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0 e
D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 DD 5.5 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C t = 4.5 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 i = 5 3.0 2.5 1.0 2.5 s I A A G CC 3.5 3.0 3.5 2.0 j = 1 * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 c 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0 e
D A A G * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 5.5 A A G C * 3.0 3.5 DD 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I A A G 3.5 3.0 3.5 2.0 CC * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 Optimal conversion cost. 6.5 7.0 6.5 7.0
What is the conversion of AGTAC and AAG ?
Midpoint Hirschberg (1975): recursive divide-and-conquer Forward Computing Backward Computing
Gap Penalty i-1, j-1 i, j-1 i-1, j i, j
Gap Penalty CC( j) = minimum cost of a conversion of Ai* to Bj DD( j) = minimum cost of a conversion of Ai* to Bj that ends with a delete
Gap Penalty RR(N - j) = minimum cost of a conversion of Ai*Tto BjT SS(N - j) = minimum cost of a conversion of Ai*Tto BjT that begins with a delete
Find Midpoint with Gap Penalty Forward Computing How to compute the midpoint? Backward Computing
Midpoint The problem of calculating the midpoint is that when we concatenate two substrings into one, we may coalesce two gaps into one Which means that we may consider min { CC + RR, DD + SS - g, II + JJ - g}
Midpoint Recall the above algorithm, we do save the space of II and JJ. We can reduce it into min {CC + RR, DD + SS - g}
Midpoint Remember that we should find minj [0, N]{min { CC + RR, DD + SS - g, II + JJ - g}} i* j j+1
Midpoint Type 1 recurrence Type 2 recurrence i* i* j* j*
Example A = agtac , B = aag, i* = 2 agtac a__ag Recurrsive call on (a, a) and (ac, ag)
Implementation Storage Requirement Memory v.s. Sequence length Compared with classic dynamic programming algorithm
Storage Requirement(1/4) Vectors : CC,DD,RR, and SS Space: 4N words M + N words for an optimal conversion 40 M = N = 38