Online Learning Platforms in the Era of COVID-19
The sudden pandemic has reshaped education worldwide, emphasizing the role of e-learning and online platforms in tertiary education. Explore the impact, benefits, and challenges associated with online learning platforms, such as Google Classroom, in ensuring continuity of education during these unprecedented times.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Fast Compressed Tries through Path Decompositions Roberto Grossi Giuseppe Ottaviano* Universit di Pisa * Part of the work done while at Microsoft Research Cambridge
Compacted tries Node label Branching character t h r three trial triangle trie triple triply ree i e a p l y e n l gle
Applications String dictionaries With prefix lookup, predecessor, Exploit prefix compression Monotone perfect hash functions Hollow or Blind tries [ALENEX 09] Binary tree (no need store branching chars) No need to store node labels, just lengths (skips)
Height vs. performance Tries can be deep no guarantee on height Bad with pointer-based trees ~1 cache miss per child operation Worse with succinct tree encodings Need to access several directories Many cache misses per child operation Large constants hidden in the O(1)
Path decomposition t triangle h r p h e l ree i e a p Recurse here with suffix le l y e n l gle Query: triple
Centroid path decomposition Decompose along the heavy paths choose the edge that has most descendants Height of the decomposed tree: O(log n) Usually lower Average height Web Queries URLs Synthetic Compacted trie 11.0 18.1 504.4 Centroid trie 5.2 6.2 2.8 Hollow trie 50.8 67.3 1005.3 Centroid hollow trie 8.0 9.2 2.8
Succinct encoding [PODS 08] presents a succinct data structure for centroid path-decomposed tries Not practical: need complex operations on succinct trees We introduce a simpler and practical encoding This encoding enables also simple compression of the labels
Succinct encoding triangle L : t1ri2a1ngle BP: ( ((( ) B : h epl (spaces added for clarity) p h e l Node label written literally, interleaved with number of other branching characters at that point in array L Corresponding branching characters in array B Tree encoded with DFUDS in bitvector BP Variant of Range Min-Max tree [ALENEX 10] to support Find{Close,Open}, more space-efficient (Range Min tree)
Compression of L ...$...index.html$....html$....html$...index.html$ 3 index 5 .html Dictionary ...$...35$...5$...5$...35$ Dictionary codewords shared among labels Codewords do not cross label boundaries ($) Use vbyte to compress the codeword ids
Compression of L Node labels (t1ri2a1ngle, l1e, ): each label is suffix of a string in the set interleaved with few special characters 1, 2, 3, Compressible if strings are compressible Dictionary and parsing computed with modified Re-Pair Domain-specific compression can be used instead Decompression overhead negligible
Experimental results (time) Experiments show gains in time comparable to the gains in height Confirm that bottleneck is traversal operations Web Queries URLs Synthetic Trie 3.5 7.0 119.8 Centroid trie 2.4 4.3 5.1 Hollow trie [ALENEX 09] 16.6 22.4 462.7 Hollow trie 7.2 13.9 137.1 Centroid hollow trie 2.8 4.4 11.1 (microseconds, lower is better) Code available at https://github.com/ot/path_decomposed_tries
Experimental results (space) For strings with many common prefixes, even non-compressed trie is space-efficient Labels compression considerably increases space-efficiency Decompression time overhead: ~10% Web Queries URLs Synthetic Hu-Tucker Front Coding 40.9% 24.4% 19.1% Centroid trie 55.6% 22.4% 17.9% Centroid trie + compression 31.5% 13.6% 0.4% (compression ratio, lower is better) Code available at https://github.com/ot/path_decomposed_tries
Thanks for your attention! Questions?
References [ALENEX 10] D. Arroyuelo, R. Ca novas, G. Navarro, and K. Sadakane. Succinct trees in practice. In ALENEX, pages 84 97, 2010. [ALENEX 09] D. Belazzougui, P. Boldi, R. Pagh, and S. Vigna. Monotone minimal perfect hashing: searching a sorted table with O(1) accesses. In SODA, pages 785 794, 2009. [PODS 08] P. Ferragina, R. Grossi, A. Gupta, R. Shah, and J. S. Vitter. On searching compressed string collections cache-obliviously. In PODS, pages 181 190, 2008.