Exploring Prefix and Suffix Trees for Information Retrieval

idata2302 algorithms data structures n.w
1 / 21
Embed
Share

This presentation delves into the world of Prefix and Suffix Trees, shedding light on their applications in information retrieval. From the basics of Prefix Trees (Tries) to building Suffix Trees, the content covers insertion, deletion, and search operations. Discover how these tree structures facilitate efficient full-text search, spell-checking, and predictive text functionalities in various applications.

  • Prefix Trees
  • Suffix Trees
  • Information Retrieval
  • Data Structures
  • Algorithms

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. IDATA2302 Algorithms & Data Structures Prefix & Suffix Trees A peek at Information Retrieval Trees / Lecture 6 Franck Chauvel axbit & NTNU Ask questions on menti.com with code 7273 1330 franck.chauvel@ntnu.no

  2. Information Retrieval? Full-text search Spell-checking Auto-complete Predictive text

  3. Agenda 1. Prefix Trees (aka Tries) 1. Insertion 2. Lookup 3. Deletion 2. Compressed Tries 3. Suffix Trees 4. Building Suffix Trees 3

  4. Prefix Trees a.k.a. tries 1st character a Small dictionary apart ape appear apple apply apricot What about appartment ? 2nd character p 3rd character a e p r 4th character apart ape e l i 5th character appear e y c l apple apply apricot april 4

  5. Nested Keys a Sentinel character: $ Modified dictionary apart$ apartment$ ape$ appear$ apple$ apply$ apricot$ p a e p r ape e l i r appear e y c l t apple apply apricot april m $ apart apartment 5

  6. The General Idea Node a b c d e z Stores keys Over a fixed alphabet Using an N-ary Tree Node apart a b c d e z Node Implementations Hash tables Sorted Arrays Binary Trees Node a b c d e z 6

  7. Search Something that does not exist search( apron ) a p a e p r apart ape No such term! e l i appear e y c l apple apply apricot april 7

  8. Search Something that exists insert( apron ) a p a e p r apart ape e l i appear e y c l apple apply apricot apricot april Found it! 8

  9. Insertion insert( apron ) a p a e p r apart ape e l i appear apron e y c l apple apply apricot april 9

  10. Deletion delete( ape ) a p a e p r apart ape ape e l i appear e y c l apple apply apricot april 10

  11. Deletion delete( apricot ) a p a e p r Longest common prefix LCP(apricot, april) = apr apart ape e l i appear e y c l apple apply apricot apricot april 11

  12. Prefix Search All keys starting with startsWith( app ) a p a e p r X apart ape e l i appear appear e y c l apple apple apply apply apricot april 12

  13. Compressed Tries a a ap p p a e p ri apart ape a e p r r c l e l apart ape apricot april appear e l i i e y appear apple apply e y c l apple apply apricot april 13

  14. Suffix Trees 14

  15. Suffixes in a Prefix Tree Suffixes of Banana 1. banana 2. anana 3. nana 4. ana 5. na 6. a a b na banana $ $ na n nana na a n $ ana anana 15

  16. Compressing Suffix Trees a b na (2,3) (0,0) (1,1) 1. Replace leaves by the index where the suffix starts 2. Replace labels with start and end indexes banana 0 $ $ na (4,4) (2,3) n nana 2 na 4 a 5 n $ (4,4) ana 3 anana 1 0 1 2 3 4 5 b a n a n a 16

  17. Full Text Search search( na ) a b na banana If we build the suffix tree of a complete text A given fragment exists If it is the prefix of any suffix $ $ na n nana nana na na a n $ ana anana 2 2 4 4 0 1 3 5 b a n a n a 17

  18. Building Suffix Trees How to build suffix trees in linear time? Ukkonen s algorithm 18

  19. Recap Tries: Store keys as paths Approximate searches prefix suffix / full search Compression! Can be built fast! 19

  20. Trees in CS Web / Mobile (UI Widget / DOM) AI/ML Encryption (Huffman Coding) Random Forests Compilation (AST) Computer Graphics (Kd Trees/CSG) Operating Systems (File System) 20

  21. Thank You! Questions, Comments, or Ideas? Franck Chauvel Axbit & NTNU franck.chauvel@ntnu.no

More Related Content