Approximate String Matching with Single Gap for Sequence Alignment

approximate string matching with a single n.w
1 / 15
Embed
Share

This paper discusses the approximate string-matching problem with Hamming distance and a single gap for sequence alignment, presenting a general algorithm with optimized time complexity. The approach is motivated by next-generation re-sequencing procedures, offering insights into efficient pattern matching with gaps in biological data analysis.

  • Bioinformatics
  • Sequence Alignment
  • Algorithm
  • Approximate Matching

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Approximate string-matching with a single gap for sequence alignment Tomas Flouri, Kunsoo Park, Kimon Frousios, Solon P. Pissis, Costas S. Iliopoulos, German Tischler Proceedings of the SecondACM International Conference on Bioinformatics and Computational Biology(BCB), Chicago, USA, July 31 - August 03, 2011 Speaker: Guan-Zhi Chen Date: 2018.12.04

  2. Abstract (1/2) This paper deals with the approximate string-matching problem with Hamming distance and a single gap for sequence alignment. We consider an extension of the approximate string-matching problem with Hamming distance, by also allowing the existence of a single gap, either in the text, or in the pattern. 3

  3. Abstract (2/2) This problem is strongly and directly motivated by the next-generation re-sequencing procedure. We present a general algorithm that requires ?(??) time, where ? is the length of the text and ? is the length of the pattern, but this can be reduced to ?(??) time, if the maximum length ? of the gap is given. 4

  4. Definition (1/2) Hamming distance: number of different chars between two strings of the same length ?: ?: A A A C C T T T G G ???,? = 2 2-mismatches for single char: ???[1],?[1] = 0 the same ???[2],?[2] = 1 5

  5. Definition (2/2) Gap: ? ? (? > 0) a string of don t care symbols of length ? Gap string: the concatenation of and gaps ?: A A C T G A C C C ?: < Gap string > ? : A C T * * * * * * ???,? = 2 6

  6. The problem and answer (1/4) ?: A G G T C A T ??? = ? G G G T A ??? = ? ?: Given ?, find all positions of factors of ? ? : there exists a single gap string ? , such that ???,? ? there exists a single gap string ? , such that ??? ,? ? ? > |?| ? < |?| ???,? ? ? = |?| 7

  7. The problem and answer (2/4) ?: A G G T C A T ??? = ? G G G T A ??? = ? ?: Given ?, find all positions of factors of ? ? : there exists a single gap string ? , such that ???,? ? there exists a single gap string ? , such that ??? ,? ? ??? ,? ? ? > |?| ? < |?| ? = |?| ? : A G G T * ???,? =1 G G G T A ?: 8

  8. The problem and answer (3/4) ?: A G G T C A T ??? = ? G G G T A ??? = ? ?: Given ?, find all positions of factors of ? ? : there exists a single gap string ? , such that ???,? ? there exists a single gap string ? , such that ??? ,? ? ??? ,? ? ? > |?| ? < |?| ? = |?| ?: A G G T C ???,? =1 G G G T A ?: 9

  9. The problem and answer (4/4) ?: A G G T C A T ??? = ? G G G T A ??? = ? ?: Given ?, find all positions of factors of ? ? : there exists a single gap string ? , such that ???,? ? there exists a single gap string ? , such that ??? ,? ? ??? ,? ? ? > |?| ? < |?| ? = |?| ?: A G G T C A ??? ,? =2 G G G T A * ? : 10

  10. ?: A G G T C A T Their method ?: G G G T A Taking ? 6,4 for example: (? > ?) ? ?,? = min(??? ? ? + 1..? ,? 1..? ,??(? 1..? ,?[1..?])) ?: A G G T C A ??? ,? = 3 * * G G G T ? : ?: A G G T C A ??? ,? = 1 G G G T * * ? : ?[6,4] = min(3, 1) = 1 11

  11. G matrix the minimum number of mismatches of the text ?[1 ..?], and the pattern ?[1 ..?], with at most one gap 0 1 G 2 3 4 5 G G 0 0 G 0 0 1 1 1 0 0 1 1 1 1 G 0 0 0 1 1 1 0 1 1 1 1 G G G 0 1 G 0 0 0 1 1 1 1 1 1 1 1 G G G 0 1 1 T 0 0 0 1 1 1 1 1 1 1 1 T T T 0 1 1 1 A 0 0 0 0 1 1 1 2 2 1 2 A A A 0 0 1 1 1 ? 0 0 0 0 0 ? 0 0 0 0 0 0 0 0 0 0 0 ? 0 0 0 0 0 0 0 ? 0 0 0 0 0 0 0 0 0 0 ? A G G T C A A A ? A G G T ? A G G T C A T T T T ? A G G T C C 1 1 2 1 3 1 4 2 5 6 7 ? ?,? = ? ? 1,? 1 + ??? ? ,? ?, ? ?,? = min ? ? 1,? 1 + ??? ? ,? ? ,? ?,? , ? < ? ? ?,? = min ? ? 1,? 1 + ??? ? ,? ? ,? ?,? , ? > ? ? = ? 12

  12. H matrix the location of the gap ? ?, ? ?, 0, ?? ? ?,? = ? ?,? ??? ? ? ?? ? ?,? = ? ?,? ??? ? > ? ? ?,? = ?? ?????? 0 1 2 3 4 5 G G 0 1 0 0 0 0 0 0 0 G G 0 2 0 0 0 0 3 4 5 G 0 3 0 0 0 0 2 3 4 G T T 0 4 0 2 1 0 1 2 0 A A 0 5 0 3 2 1 0 0 0 ? 0 0 0 0 0 0 0 0 7 ? 0 1 2 3 4 5 6 0 ? A G G T C A T T ? A G G T C A 1 2 3 4 5 6 7 13

  13. GAP-MISMATCHES algorithm 14

  14. Speeding up 2? + 1 ???? 15

Related


More Related Content