Algorithms for Computing Longest Common Subsequences

faster algorithms for computing longest common n.w
1 / 30
Embed
Share

Explore faster algorithms for finding the longest common increasing subsequences across multiple input sequences. The research delves into improving running time efficiency and space complexity, introducing the concept of longest common weakly-increasing subsequences. Bounded heaps and van Emde Boas tree implementation for bounded heaps are discussed for efficient data structure operations.

  • Algorithms
  • Common Subsequences
  • Efficiency
  • Bounded Heaps
  • Data Structures

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Faster algorithms for computing longest common increasing subsequences Martin Kutz, Gerth St lting Brodal, Kanela Kaligosi, Irit Katriel Journal of Discrete Algorithms 9 (2011), pp. 314 325 Presenter: Shou-Fu Lo Date: 2017/11/21 1

  2. Abstract(1/2) We present algorithms for finding a longest common increasing subsequence of two or more input sequences. For two sequences of lengths n and m, where m n, we present an algorithm with an output-dependent expected running time of O ((m + n ) loglog + Sort) and O (m) space, where is the length of an LCIS, is the size of the alphabet, and Sort is the time to sort each input sequence. For k 3 length-n sequences we present an algorithm which improves the previous best bound by more than a factor k for many inputs. In both cases, our algorithms are conceptually quite simple but rely on existing sophisticated data structures. 2

  3. Abstract(2/2) Finally, we introduce the problem of longest common weakly- increasing (or non-decreasing) subsequences (LCWIS), for which we present an O (min{m + nlogn, mloglogm})-time algorithm for the 3-letter alphabet case. For the extensively studied longest common subsequence problem, comparable speedups have not been achieved for small alphabets. 3

  4. Bounded heaps Insert(?, s, p, d): Insert into the BH the key s with priority p and associated data d. DecreasePriority(?, s, p, d): If the BH does not already contain the key s, perform Insert( , s, p, d). Otherwise, set this key s priority to min{p, p }, where p is its previous priority. BoundedMin(?, s): Return the item that has minimum priority among all items in with key smaller than s. If does not contain any items with key smaller than s, return invalid . 4

  5. Bounded heaps Using van Emde Boas tree (vEB tree). Lemma 1.There exists an implementation of bounded heaps that requires O (n) space and supports each of the above operations in O (loglogn) amortized time, where keys are drawn from the set {1, . . . , n}. 5

  6. LCIS b1 2 b2 4 b3 5 b4 7 b5 6 b6 1 b7 3 a1 5 a2 2 a3 3 a4 4 a5 6 B A a1 5 a2 2 a3 3 a4 4 a5 6 b1 2 b2 4 b3 5 b4 6 b5 3 A B Li[j] = k L1[1] = 3 L1[2] = 1 L1[3] = 5 L1[4] = 2 L1[5] = 4 i = LCIS j = aj B k = aj B index 6

  7. LCIS a1 5 a2 2 a3 3 a4 4 a5 6 b1 2 b2 4 b3 5 b4 6 b5 3 A B BoundedMin(?, aj) 2 3 4 5 6 aj L1[1] = 3 L1[2] = 1 L1[3] = 5 L1[4] = 2 L1[5] = 4 DecreasePriority( ?, aj, Li-1[j], d) Li-1[j] aj Li-1[j] aj Li-1[j] aj Li-1[j] aj Li-1[j] 3 2 3 4 5 6 L2[1] = L2[2] = L2[3] = L2[4] = L2[5] = 1 3 2 3 4 5 6 5 1 5 3 2 4 2 3 4 5 6 1 5 2 3 2 3 4 5 6 7 1 5 2 3 4

  8. LCIS a1 5 a2 2 a3 3 a4 4 a5 6 b1 2 b2 4 b3 5 b4 6 b5 3 A B 2 3 4 5 6 aj L2[1] = L2[2] = L2[3] = 5 L2[4] = 2 L2[5] = 4 Li-1[j] aj Li-1[j] aj Li-1[j] aj Li-1[j] aj Li-1[j] 2 3 4 5 6 L3[1] = L3[2] = L3[3] = L3[4] = L3[5] = 2 3 4 5 6 5 2 3 4 5 6 4 5 2 2 3 4 5 6 8 5 2 4

  9. O ((m + n) log log + Sort (m)) time algorithm 9

  10. Longest common non-decreasing or weakly-increasing subsequences (LCWIS) = { , , } in their standard order: < < . A = B = We show how to solve LCWIS for the 2-letter alphabet in linear time and for the 3-letter alphabet in O (min{m +nlog n, m log log m}) time 10

  11. LCWIS for the 2-letter case = { , , } in their standard order: < < . A = B = Preprocessing: Arrays NumA, , NumB, , NumA, , . . . , NumB, that count the number of s, s and s. e.g. NumA, [4]= 3, NumB, [3]= 1; We also create arrays PosA, through PosB, , which provide us with the position of the ith occurrence of , , or in A or B. e.g. PosA, [4] = 5, PosB, [2] = 6; 11

  12. LCWIS for the 2-letter case = { , , } in their standard order: < < . A = B = For each i, where 0 i min{NumA, [n], NumB, [m]}, we determine the position of the ith in A and B and then the number of s that come after those positions in the two sequences. ? i = 1 , LCWIS = 1 + min {NumA, [n] - NumA, [PosA, [1]] , NumB, [m] - NumB, [PosB, [1]] } i = 2 The total time is O (m). 12

  13. LCWIS for the 3-letter case in O (m + nlogn) time The na ve extension of the above approach to three letters would have to deal with a quadratic number of tentative exponent pairs (i, j) for subsequences of type ? ? 13

  14. LCWIS for the 3-letter case in O (m + nlogn) time We determine the ith pair of s from the left and then count the number of s in A and B up to the split (s, t). There are p = NumA, [s] NumA, [PosA, [i]] such s in A and q = NumB, [t] NumB, [PosB, [i]] in B. left-aligned -part right-aligned -part 14

  15. LCWIS for the 3-letter case in O (m + nlogn) time Assume p q for the moment. For the three values i, p, q, we define a piecewise-linear function ?? slope-1 segment from (0, i + p) to (q p, i + q) and a horizontal extension from that point to infinity. ?,?consisting of a 15

  16. LCWIS for the 3-letter case in O (m + nlogn) time For example, with no extra s from the right, we only get min(p, q) = p many pairs of s, which together with the i s yield a sequence of length ?? If we have q p free s on the right, we could get a sequence of length f (q p) = i + q. ?,?(0) = i + p. 16

  17. LCWIS for the 3-letter case in O (m + nlogn) time ?,?(x - y) tells us exactly how Assuming x y, The value ?? long a subsequence we can build to the left of the split if we throw in a surplus of (x - y) s into A. We can now use our function ?? LCWIS of type ? ?. That would cost quadratic time and space. ?,?to obtain the length of an 17

  18. LCWIS for the 3-letter case in O (m + nlogn) time ?,?for all values of i ?,?, the upper The trick is now to draw the functions ?? into one diagram. Their pointwise maximum ?? envelope of their plots. 18

  19. LCWIS for the 3-letter case in O (m + nlogn) time In particular, any such subsequence with exactly i many s has all its s to the left of the cut (PosA, [i], PosB, [i]) and all its s to the right of that cut. 19

  20. LCWIS for the 3-letter case in O (m + nlogn) time We assign levels to the splits in S : let the level of the ith split (counting from left) be the index of the least significant bit equal to one in the binary representation of i. A = B = t = min(NumA, [n], NumB, [m] ) = 4 ( the size of Split ) h log2? = 2 ( the highest level ) 20

  21. LCWIS for the 3-letter case in O (m + nlogn) time p A = B = q d level-r = 20 21 22 21

  22. LCWIS for the 3-letter case in O (m + nlogn) time x A = B = y d length = max(?d(x, y), ?d( y, x)) + min(x, y) + i 22

  23. LCWIS for the 3-letter case in O (m log log m) time Assuming that such a procedure for the 2-letter case exists, For each k, where k = min{NumA, [n], NumB, [m]} down to 0. ak= PosA, [NumA, [n] NumA, [k] + 1] bk= PosB, [NumB, [m] NumB, [k] + 1] ( A[1 . . . ak], B [1 . . . bk]) In this way we obtain for every k the length of a CWIS of type ?. 23

  24. LCWIS for the 3-letter case in O (m log log m) time A = B = We solve the 2-letter subproblem ( A[1 . . . ak], B [1 . . . bk]) by using the 2-letter online procedure. Given the solution to the 2-letter subproblem ( A[1 . . . ak+1], B [1 . . . bk+1]) that we solved in the previous iteration, we reveal to the procedure the symbols B [bk+1+ 1], . . . , B [bk] and A[ak+1+ 1], . . . , A[ak] 24

  25. LCWIS for the 3-letter case in O (m log log m) time Solving the 2-letter case: exA= (NumA, [a] NumA, [PosA, [i]]) (NumB, [b] NumB, [PosB, [i]]) 25

  26. LCWIS for the 3-letter case in O (m log log m) time Keep four priority queues LA, EXA, LB, EXBas described and a variable that holds the current solution, i.e., the currently best CWIS. Maximum priority queue LAkeeping CWISs with exA(i) > 0 Minimum priority queue EXA that holds the CWISs of LA sorted by their excess of s in A . 26

  27. LCWIS for the 3-letter case in O (m log log m) time A = B = Create a CWIS with zero s, e.g. CWIS 0 = 0. Insert 0 in LAand EXAand set it to be the current solution. When a new is revealed in A or B and it is the ith one in its sequence, check whether there are i s in the other sequence and if so, create a new CWIS ? = ? CWIS 1 = 1 27

  28. LCWIS for the 3-letter case in O (m log log m) time A = B = Compute its excess in A and if it is non-negative insert ? into LAand EXA. If the excess in A is negative, then the excess in B is positive, so, insert ? into LBand EXB. exA(1) = 3 1 = 2 0 => insert 1 into LAand EXA. CWIS 1 = 1 + (NumB, [b] NumB, [PosB, [1]]) = 1 + 1 = 2 28

  29. LCWIS for the 3-letter case in O (m log log m) time A = B = When a new is revealed in A, check whether the minimum of EXAnow has negative excess and if so, delete it from LAand EXAand insert it into LBand EXB. Next, check whether the maximum between the maximum in LAand the maximum in LBhas length greater than the current solution and if so, set it to be the current solution. 29

  30. LCWIS for the 3-letter case in O (m log log m) time We can use as a priority queue a van Emde Boas tree which supports each operation in O (loglog m) time. The time we spent in the above algorithm each time we encounter a new symbol is O (loglog m), so we have Theorem 4. We can find an LCWIS of two 3-letter sequences of lengths m and n, with m n, in O ((m+n) log log m) time = O (m log log m) time. 30

Related


More Related Content