Algorithm Comparison: Gotoh vs. Hirschberg

eugene w myers and webb miller n.w
1 / 60
Embed
Share

Explore the differences between Gotoh's algorithm and Hirschberg's algorithm in sequence alignment, focusing on space complexity, similarity scoring, and conversion cost. Discover their implementations and see examples of their application in bioinformatics.

  • Algorithm Comparison
  • Sequence Alignment
  • Gotoh vs Hirschberg
  • Bioinformatics

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Eugene W.Myers and Webb Miller

  2. Outline Introduction Gotoh's algorithm O(N) space Gotoh's algorithm Main algorithm Implementation Conclusion

  3. Introduction Space, not time Hirschberg s Algorithm Maximizing the similarity score of an alignment Gotoh s Algorithm Minimizing the difference score of a conversion Linear space version for affine gap penalties. For a megabyte of memory. W.Myers and Miller : sequences of length 62500 Altschul and Erickson : sequences length < 1070

  4. Transformation (1/2) Gotoh sAlgorithm Hirschberg s Algorithm Aligned Pair ? ?,? ? ?,? = ???? ? ?,? ???(?) = ? + ? Affine Gap Penalties ? = ? ???(?) = ? + ?? 1 2???? ? =

  5. Transformation (2/2) Match = 8, Mismatch = -5, Gap Symbol = -3, Gap-open = -4 < ( 6 < 5) (Hirschberg) 2? < ? ?,? => 2? < ???? ? ?,? => ???? 2? > ? ?,? 2 = ???? 2? 1 2???? ? =

  6. Example(1/2) Hirschberg s Algorithm Gotoh s Algorithm Match 8 0 ? ?,? = ???? ? ?,? Mismatch -5 13 Gap-open -4 4 ? = ? 1 2???? ? Gap Symbol -3 7 =

  7. Example(2/2) Gotoh sAlgorithm Hirschberg s Algorithm C Cost (minimum) 1 A : ACGGTTCAAG B : ACGGTTCAAG Hirschberg : 10x8 = 80 C = 0 (conversion cost) 1 210 + 10 8 0 = 80 (Alignment score) Hirschberg : 9x8 + (-5) = 67 C = 13 1 2 A : ACGGTTCAAG B : ACGGATCAAG 210 + 10 8 13 = 67 Hirschberg : 9x8 + (-4-3) = 65 C = 11 (7+4) 1 3 A : ACGGCTCAAG B : ACGGCTC AG 210 + 9 8 11 = 65

  8. R99922005

  9. Some notations : the i-symbol prefix of A : the j-symbol prefix of B C(i, j):minimum cost of a conversion of to a b a b 1 i a j b i A j B 1 ........ 2 ........ 2 j B i A

  10. Simple gap(1/4) gap(k)= h*k + ( , 1 j ) ( , b ) C i j w a i = + ) 1 , ( j i C ) min , ( i C ) 1 j ( , ) C i w j a + ( , 1 ( , ) w b i j

  11. Simple gap(2/4) A A G 0.0 2.5 3.0 3.5 A G T A C 2.5 0.0 2.5 3.0 Space= O(n^2) 3.0 2.5 1.0 2.5 3.5 3.0 3.5 2.0 4.0 3.5 3.0 4.5 4.5 4.0 4.5 4.0

  12. Simple gap(3/4) m/2

  13. Simple gap(4/4) Forward score and backward score Space: O(m+n)

  14. Affine gap(1/8) A gap of length k : cost = g + k*h A - - - T A A C T C G A A T C - - T

  15. Affine gap(2/8) C(i, j):minimum cost of a conversion of to D(i, j):minimum cost of a conversion of to that deletes I(i, j):minimum cost of a conversion of to that inserts j B i A i a j B i A j b i A j B

  16. Affine gap(3/8) = , ( j i C ) + if i > 0 and j> 0 if i = 0 and j> 0 if i > 0 and j= 0 if i = 0 and j= 0 min{ gap , ( i ), , ( i I ), ( , 1 ) 1 ( , )} D j j C i j w a b i j ( ) j ( ) gap i 0

  17. Affine gap(4/8) = , ( j i D ) g + + if i > 0 and j> 0 if i = 0 and j> 0 min{ ( , 1 ), ( , 1 ) } D i j C i j g h + , 0 ( ) C j

  18. Affine gap(5/8) = , ( j i I ) + + if i > 0 and j> 0 if i > 0 and j= 0 min{ , ( 1 ), , ( ) 1 } I i j C i j g h + ) 0 , ( C i g

  19. Affine gap(6/8)

  20. D A A G Affine gap(7/8) A A G * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 C * 2.5 5.0 5.5 0.0 2.5 3.0 3.5 * 3.0 3.5 5.0 * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I 3.5 3.0 3.5 2.0 A A G 4.0 3.5 3.0 4.5 * * * * A G T A C 4.5 4.0 4.5 4.0 4.5 5.0 2.5 3.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0

  21. D A A G Affine gap(8/8) A A G C * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 5.5 0.0 2.5 3.0 3.5 * 3.0 3.5 5.0 * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I 3.5 3.0 3.5 2.0 A A G 4.0 3.5 3.0 4.5 * * * * A G T A C 4.5 4.0 4.5 4.0 4.5 5.0 2.5 3.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0

  22. R99922041

  23. Observation i-th row of C and D depends only on row i and i-1. i-th row of I depends only on row i. D I C

  24. Linear Space Use two one-dimension arrays (CC and DD) and three variables.

  25. Linear Space CC k = C i,k ,if k < j C i 1,k ,if k j DD k = D i,k ,if k < j D i 1,k ,if k j e = I(i , j-1) c = C(i , j-1) s = C(i-1 , j-1)

  26. Algorithm

  27. D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C DD * 5.0 5.5 6.0 * 2.5 5.0 5.5 t = 2.0 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 CC * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I A A G 3.5 3.0 3.5 2.0 * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0

  28. D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C DD * 5.0 5.5 6.0 * 2.5 5.0 5.5 t = 2.0 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 CC * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I A A G 3.5 3.0 3.5 2.0 * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0

  29. D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 DD 5.5 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C t = 4.5 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 i = 5 3.0 2.5 1.0 2.5 s I A A G CC 3.5 3.0 3.5 2.0 * * * * A G T A C e 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 c 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0

  30. D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 DD 5.5 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C t = 4.5 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 i = 5 3.0 2.5 1.0 2.5 s I A A G CC 3.5 3.0 3.5 2.0 j = 1 * * * * A G T A C e 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 c 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0

  31. D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 DD 5.5 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C t = 4.5 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 i = 5 3.0 2.5 1.0 2.5 s I A A G CC 3.5 3.0 3.5 2.0 j = 1 * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 c 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0 e

  32. D A A G g = 2.0 h = 0.5 * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 DD 5.5 A A G C * 3.0 3.5 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C t = 4.5 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 i = 5 3.0 2.5 1.0 2.5 s I A A G CC 3.5 3.0 3.5 2.0 j = 1 * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 c 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 6.5 7.0 6.5 7.0 e

  33. D A A G * 4.5 5.0 5.5 A G T A C * 5.0 5.5 6.0 * 2.5 5.0 5.5 A A G C * 3.0 3.5 DD 5.0 0.0 2.5 3.0 3.5 * 3.5 4.0 4.5 A G T A C 2.5 0.0 2.5 3.0 * 4.0 4.5 5.0 3.0 2.5 1.0 2.5 I A A G 3.5 3.0 3.5 2.0 CC * * * * A G T A C 4.0 3.5 3.0 4.5 4.5 5.0 2.5 3.0 4.5 4.0 4.5 4.0 5.0 5.5 5.0 3.5 5.5 6.0 5.5 6.0 6.0 6.5 6.0 5.5 Optimal conversion cost. 6.5 7.0 6.5 7.0

  34. What is the conversion of AGTAC and AAG ?

  35. B95902077

  36. Midpoint Hirschberg (1975): recursive divide-and-conquer Forward Computing Backward Computing

  37. Gap Penalty i-1, j-1 i, j-1 i-1, j i, j

  38. Gap Penalty CC( j) = minimum cost of a conversion of Ai* to Bj DD( j) = minimum cost of a conversion of Ai* to Bj that ends with a delete

  39. Gap Penalty RR(N - j) = minimum cost of a conversion of Ai*Tto BjT SS(N - j) = minimum cost of a conversion of Ai*Tto BjT that begins with a delete

  40. Find Midpoint with Gap Penalty Forward Computing How to compute the midpoint? Backward Computing

  41. R99922035

  42. Midpoint The problem of calculating the midpoint is that when we concatenate two substrings into one, we may coalesce two gaps into one Which means that we may consider min { CC + RR, DD + SS - g, II + JJ - g}

  43. Midpoint Recall the above algorithm, we do save the space of II and JJ. We can reduce it into min {CC + RR, DD + SS - g}

  44. Midpoint Remember that we should find minj [0, N]{min { CC + RR, DD + SS - g, II + JJ - g}} i* j j+1

  45. Midpoint Type 1 recurrence Type 2 recurrence i* i* j* j*

  46. Example A = agtac , B = aag, i* = 2 agtac a__ag Recurrsive call on (a, a) and (ac, ag)

  47. R99922062

  48. Implementation Storage Requirement Memory v.s. Sequence length Compared with classic dynamic programming algorithm

  49. Storage Requirement(1/4) Vectors : CC,DD,RR, and SS Space: 4N words M + N words for an optimal conversion 40 M = N = 38

More Related Content