Approximating Edit Distance: Progress in Algorithms and Applications

approximating edit distance within constant n.w
1 / 19
Embed
Share

Explore the advancements in approximating edit distance within constant factors in truly sub-quadratic time. Learn about the various variants of edit distance, its applications in bioinformatics, pattern recognition, text processing, and information retrieval, as well as fine-grained complexity and the latest algorithms for computing edit distance. Discover how different algorithms and approaches are improving the efficiency and accuracy of calculating edit distances for a wide range of practical uses.

  • Edit Distance
  • Algorithms
  • Applications
  • Approximation
  • Bioinformatics

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Approximating edit distance within constant factor in truly sub-quadratic time Michal Kouck Charles U., Prague Joint work with: Diptarka Chakraborty, Debarati Das, Elazar Goldenberg and Mike Saks

  2. Edit distance a b c z d e f w h i k l m ? ? a b c d e f g h i j k l m Edit distance ????(?,?): the number of that transform ? into ?. 1) bit flips/symbol changes 2) insertions, and 3) deletions

  3. Variants of edit distance Levenshtein distance: vanilla edit distance. Longest Common Subsequence: dual measure. Ulam distance: large alphabet, each symbol appears at most once. Edit distance with moves: additional operation block move. Hamming distance.

  4. Applications of edit distance Bioinformatics. Pattern recognition. Text processing. Information retrieval.

  5. Our result Thm: An ? 1 -approximation algorithm for edit distance running in time ?(?2 2/7). 12/7 = 1.714

  6. Computing edit distance Wagner-Fischer 74 Masek-Paterson 80 Grabowski 16 O(?2) O O (?2/log? ) (?2log log ?/log2? ) Ukkonen 85 Myers 86, Landau-Vishkin 88, Landau-Myers-Schmidt 98 O(??) O(? + ?2) Chakraborty-Goldenberg-K. 16 Belazzougui-Zhang 16 O(? + ?6) (synchronous stream.) O ? + ?8 (async. streaming) and many others ? = ????(?,?)

  7. Fine-grained complexity Backurs-Indyk 15: An algorithm for edit distance in time ?(?2 ?) implies an algorithm for SAT in time 2(1 ?)?. (contradicting Strong Exponential Time Hypothesis (SETH).) Abboud-Hansen-Vassilevska Williams-Williams'16, Abboud-Backurs- Vassilevska Williams'15, Bringmann-K nnemann'15: ?(?2) algorithms for edit distance imply circuit lower bounds.

  8. Approximating edit distance approximation time ? ? ? ?(?max ?/2,2? 1 ) ?(?) ?(?) Landau-Myers-Schmidt 98 ?1 ? Batu-EKMRRS 03 B.Yossef-Jayram-Krauthgamer-Kumar 04?3/7 ?1/3+?(1) 2 log ? log?(1/?)? Batu-Ergun-Sahinalp 06 ?(?1+?(1)) ?(?1+?) Andoni-Onak 09 Andoni-Krauthgamer-Onak 10 Boroujeni-Ehsani-Ghodsi-Hajiaghayi-Seddighin'18 quantum ?(?1.708 ) ?(1) Abboud-Backurs 17: (1 + 1/???? log)-inapprox. in time ?2 ?

  9. Gap Edit Distance Fixed constant ? > 1. Input: ?,? ?, ? [0,1]. Output: ????(?,?) ?? . YES if ????(?,?) > ??? . NO if Our result: Algorithm for ? = 3 + ? running in time ?(?4/7?12/7).

  10. Main ideas ? ? ? ? Assume ????(?,?) ?? : For most ? of size , ????(??,??) 2? = ??

  11. Sparse case ? ?2 ?1 ?3/7 ? ? |?| = ?1/7 threshold ? = ?2/7 Instead of cost ?3/7 ? = ?10/7, we pay (?3/7)2 ? = ?8/7.

  12. Dense case ? ?4 ?? ??,??? 3? ??(??,???) 2? ?4 ?3 ??(???,???) 5? ?2 ?1 ?1 ?2 ?3 ? ?4 ? |?| = ?1/7 threshold ? = ?2/7 One pays ? |?| ?/( ? ?) = ?12/7.

  13. Combining the two cases ? ? 1) Test each narrow column and process dense ones. 2) In each wide column, sample a sparse column and extend. 3) Repeat 1-2 for closeness parameters ? ?,1 ,? = 2 ?.

  14. Recovering a good match ? ? 4) Find boxes that well approximate the best match.

  15. Under the rug ? Ukkonen 85 ? ?? ?? ? ? 2 ? ? = ?12/7 Total number of boxes

  16. Combining boxes ? ?? ? ? Shrink vertically each box by ??. ?? its edit distance

  17. Recovering a good match ? ? One sweep over the boxes in quasi-linear time.

  18. Our results Thm: There is an ? 1 -approximation algorithm for edit distance running in time ?(?1.647 .). Thm: There is an ? 1 -approximation algorithm for the high range edit distance running in time ?(?1.618 .).

  19. Open problems Better approximation factor? Faster algorithm? Andoni: constant factor approximation in time ?(?3/2+?). K.-Saks: constant factor approximation in time ?(?1+?) for the high range of edit distance. Reduction of low edit distance into high edit distance?

More Related Content