Algorithm Comparison: Gotoh vs. Hirschberg

1 / 60

Embed Share

Explore the differences between Gotoh's algorithm and Hirschberg's algorithm in sequence alignment, focusing on space complexity, similarity scoring, and conversion cost. Discover their implementations and see examples of their application in bioinformatics.

kay_z Follow

Uploaded on Apr 12, 2025 | 2 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Eugene W.Myers and Webb Miller

Outline Introduction Gotoh's algorithm O(N) space Gotoh's algorithm Main algorithm Implementation Conclusion

Introduction Space, not time Hirschberg s Algorithm Maximizing the similarity score of an alignment Gotoh s Algorithm Minimizing the difference score of a conversion Linear space version for affine gap penalties. For a megabyte of memory. W.Myers and Miller : sequences of length 62500 Altschul and Erickson : sequences length < 1070

Transformation (1/2) Gotoh sAlgorithm Hirschberg s Algorithm Aligned Pair ? ?,? ? ?,? = ???? ? ?,? ???(?) = ? + ? Affine Gap Penalties ? = ? ???(?) = ? + ?? 1 2???? ? =

Transformation (2/2) Match = 8, Mismatch = -5, Gap Symbol = -3, Gap-open = -4 < ( 6 < 5) (Hirschberg) 2? < ? ?,? => 2? < ???? ? ?,? => ???? 2? > ? ?,? 2 = ???? 2? 1 2???? ? =

Example(1/2) Hirschberg s Algorithm Gotoh s Algorithm Match 8 0 ? ?,? = ???? ? ?,? Mismatch -5 13 Gap-open -4 4 ? = ? 1 2???? ? Gap Symbol -3 7 =

Example(2/2) Gotoh sAlgorithm Hirschberg s Algorithm C Cost (minimum) 1 A : ACGGTTCAAG B : ACGGTTCAAG Hirschberg : 10x8 = 80 C = 0 (conversion cost) 1 210 + 10 8 0 = 80 (Alignment score) Hirschberg : 9x8 + (-5) = 67 C = 13 1 2 A : ACGGTTCAAG B : ACGGATCAAG 210 + 10 8 13 = 67 Hirschberg : 9x8 + (-4-3) = 65 C = 11 (7+4) 1 3 A : ACGGCTCAAG B : ACGGCTC AG 210 + 9 8 11 = 65

R99922005

Some notations : the i-symbol prefix of A : the j-symbol prefix of B C(i, j):minimum cost of a conversion of to a b a b 1 i a j b i A j B 1 ........ 2 ........ 2 j B i A

Simple gap(1/4) gap(k)= h*k + ( , 1 j ) ( , b ) C i j w a i = + ) 1 , ( j i C ) min , ( i C ) 1 j ( , ) C i w j a + ( , 1 ( , ) w b i j

Simple gap(2/4) A A G 0.0 2.5 3.0 3.5 A G T A C 2.5 0.0 2.5 3.0 Space= O(n^2) 3.0 2.5 1.0 2.5 3.5 3.0 3.5 2.0 4.0 3.5 3.0 4.5 4.5 4.0 4.5 4.0

Simple gap(3/4) m/2

Simple gap(4/4) Forward score and backward score Space: O(m+n)

Affine gap(1/8) A gap of length k : cost = g + k*h A - - - T A A C T C G A A T C - - T

Affine gap(2/8) C(i, j):minimum cost of a conversion of to D(i, j):minimum cost of a conversion of to that deletes I(i, j):minimum cost of a conversion of to that inserts j B i A i a j B i A j b i A j B

Affine gap(3/8) = , ( j i C ) + if i > 0 and j> 0 if i = 0 and j> 0 if i > 0 and j= 0 if i = 0 and j= 0 min{ gap , ( i ), , ( i I ), ( , 1 ) 1 ( , )} D j j C i j w a b i j ( ) j ( ) gap i 0

Affine gap(4/8) = , ( j i D ) g + + if i > 0 and j> 0 if i = 0 and j> 0 min{ ( , 1 ), ( , 1 ) } D i j C i j g h + , 0 ( ) C j

Affine gap(5/8) = , ( j i I ) + + if i > 0 and j> 0 if i > 0 and j= 0 min{ , ( 1 ), , ( ) 1 } I i j C i j g h + ) 0 , ( C i g

Affine gap(6/8)

D A A G Affine gap(7/8) A A G * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 C * 2.5 5.0 5.5 0.0 2.5 3.0 3.5 * 3.0 3.5 5.0 * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I 3.5 3.0 3.5 2.0 A A G 4.0 3.5 3.0 4.5 * * * * A G T A C 4.5 4.0 4.5 4.0 4.5 5.0 2.5 3.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0

D A A G Affine gap(8/8) A A G C * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 5.5 0.0 2.5 3.0 3.5 * 3.0 3.5 5.0 * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I 3.5 3.0 3.5 2.0 A A G 4.0 3.5 3.0 4.5 * * * * A G T A C 4.5 4.0 4.5 4.0 4.5 5.0 2.5 3.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0

R99922041

Observation i-th row of C and D depends only on row i and i-1. i-th row of I depends only on row i. D I C

Linear Space Use two one-dimension arrays (CC and DD) and three variables.

Linear Space CC k = C i,k ,if k < j C i 1,k ,if k j DD k = D i,k ,if k < j D i 1,k ,if k j e = I(i , j-1) c = C(i , j-1) s = C(i-1 , j-1)

Algorithm

D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C DD * 5.0 5.5 6.0 * 2.5 5.0 5.5 t = 2.0 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 CC * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I A A G 3.5 3.0 3.5 2.0 * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0

D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C DD * 5.0 5.5 6.0 * 2.5 5.0 5.5 t = 2.0 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 CC * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I A A G 3.5 3.0 3.5 2.0 * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0

D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 DD 5.5 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C t = 4.5 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 i = 5 3.0 2.5 1.0 2.5 s I A A G CC 3.5 3.0 3.5 2.0 * * * * A G T A C e 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 c 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0

D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 DD 5.5 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C t = 4.5 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 i = 5 3.0 2.5 1.0 2.5 s I A A G CC 3.5 3.0 3.5 2.0 j = 1 * * * * A G T A C e 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 c 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0

D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 DD 5.5 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C t = 4.5 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 i = 5 3.0 2.5 1.0 2.5 s I A A G CC 3.5 3.0 3.5 2.0 j = 1 * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 c 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0 e

D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 DD 5.5 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C t = 4.5 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 i = 5 3.0 2.5 1.0 2.5 s I A A G CC 3.5 3.0 3.5 2.0 j = 1 * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 c 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0 e

D A A G * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 5.5 A A G C * 3.0 3.5 DD 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I A A G 3.5 3.0 3.5 2.0 CC * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 Optimal conversion cost. 6.5 7.0 6.5 7.0

What is the conversion of AGTAC and AAG ?

B95902077

Midpoint Hirschberg (1975): recursive divide-and-conquer Forward Computing Backward Computing

Gap Penalty i-1, j-1 i, j-1 i-1, j i, j

Gap Penalty CC( j) = minimum cost of a conversion of Ai* to Bj DD( j) = minimum cost of a conversion of Ai* to Bj that ends with a delete

Gap Penalty RR(N - j) = minimum cost of a conversion of Ai*Tto BjT SS(N - j) = minimum cost of a conversion of Ai*Tto BjT that begins with a delete

Find Midpoint with Gap Penalty Forward Computing How to compute the midpoint? Backward Computing

R99922035

Midpoint The problem of calculating the midpoint is that when we concatenate two substrings into one, we may coalesce two gaps into one Which means that we may consider min { CC + RR, DD + SS - g, II + JJ - g}