Overlay Architecture for Pattern Matching in Computer Science
An effective overlay architecture for pattern matching is crucial in various domains like threat detection, network analysis, and genomic sequencing. The architecture involves utilizing Nondeterministic Finite Automata (NFA) to recognize complex patterns efficiently, demonstrating the process with examples such as ababc sequences. By preprocessing TCAM pattern matches and employing Bloom filters, the system can accurately identify patterns such as threat signatures, network addresses, and genomic sequences. This architecture enables fast and accurate pattern matching, enhancing performance across different applications.
Uploaded on Mar 09, 2025 | 1 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
An Overlay Architecture for Pattern Matching Rasha Karakchi, Charles Danial, Jason Bakos Computer Science and Engineering 1
Pattern Matching Patterns: ex. threat signatures, network addresses, genomic seq. (slow) preprocess TCAM pattern matches (fast) Bloom filter Automata input sequence 2
Nondeterministic Finite Automata (NFA) Recognize pattern: ababc Input Active States 0 a a b b c 5 2 1 0 4 3 3
Nondeterministic Finite Automata (NFA) Recognize pattern: ababc Input Active States 0 0,1 Input: a a a a b b c 5 2 1 0 4 3 tracking pattern a 4
Nondeterministic Finite Automata (NFA) Recognize pattern: ababc Input Active States 0 0,1 0,2 Input: ab a b a a b b c 5 2 1 0 4 3 tracking pattern ab 5
Nondeterministic Finite Automata (NFA) Recognize pattern: ababc Input Active States 0 0,1 0,2 0,1,3 Input: aba a b a a a b b c 5 2 1 0 4 3 tracking pattern a tracking pattern aba 6
Nondeterministic Finite Automata (NFA) Recognize pattern: ababc Input Active States 0 0,1 0,2 0,1,3 0,2,4 Input: abab a b a b a a b b c 5 2 1 0 4 3 tracking pattern ab tracking pattern abab 7
Nondeterministic Finite Automata (NFA) Recognize pattern: ababc Input Active States 0 0,1 0,2 0,1,3 0,2,4 0,3 Input: ababa a b a b a a a b b c 5 2 1 0 4 3 lost track tracking pattern aba 8
Nondeterministic Finite Automata (NFA) Recognize pattern: ababc Input Active States 0 0,1 0,2 0,1,3 0,2,4 0,3 0,4 Input: ababab a b a b a b a a b b c 5 2 1 0 4 3 tracking pattern abab 9
Nondeterministic Finite Automata (NFA) Recognize pattern: ababc Input Active States 0 0,1 0,2 0,1,3 0,2,4 0,3 0,4 0,5 (accept) Input: abababc a b a b a b c a a b b c 3 4 1 2 0 5 accept 10
Applications Hamming distance Brill tagging rules Protein Motif signatures Sequential Pattern Mining 11
State Element (SE) b a a b c a b c a b To output priority encoders Other SEs Shift in 256x1 RAM counter Input buffer 64Kx8 start report to SEs: ? 1 2 ? 2 to ? + ? Active (f = 6 here) n Activation from predecessors To successors Shift out 12 ?? = ????????? ?????? ???%?????? ? %????
Runtime Behavior R = reconfigure IBF = input buf. flush OBF = output buf. flush FIB = fill input buf. Encoders 256x1 RAM counter Output buffer buffer Output Input buffer 64Kx8 64Kx8 Input buffer start start report report Active Active Activation from predecessors n To successors time x (#states/#SEs) R IBF FIB OBF R IBF FIB OBF R IBF x input_size/64K
Overlay Configurations Max. B/W* for 25% a.s. (GB/s) Through- Put 24K states (MB/s) Through- Put 128K states (MB/s) Max. report rate (GHz) Max. Report cycles SEs (K) Fmax (MHz) R f Encoders ( s) 152 16 100% 2.4 21 4 103 1866 14 3 136 32 50% 2.2 31 8 44 1427 27 5 122 48 33% 2.0 1091 12 25 43 32 6 121 16 12 692 64 25% 1.9 53 36 9 119 20 6 426 80 20% 1.9 67 31 9 112 96 17% 1.8 240 24 3 74 67 11 14
Physical Mapping Assume f = 9 [n-4 to n+4] Hardware fan-out f allows connection from SE n to SEs: n 2 7! = 5040 possible mappings SE Mapping: ? 1 ? 2 to n + f valid mappings 5 0 State A B C D E F G SE 0 1 2 3 4 5 6 6 24 (0.5%) 7 48 (1%) 8 372 (7%) 15
Performance Results Minimum Hardware Fan-out Achieved 40 18 17 21 8 62 12 29 60 8 42 4 Through put (MB/s) 20 13 63 63 10 3 15 16 5 24 13 12 # #R/ buffer 4 5 1 1 5 19 5 5 17 2 6 5 Benchmark Brill ClamAV Levenshtein Hamming SPM EntityResolution RandomForest PowerEN Snort Fermi Protomata DotStar states 26668 49538 2784 11346 100500 95136 75340 40513 69029 40783 42061 96438 Overlay 8K 12K 12K 12K 16K 4K 16K 8K 4K 16K 8K 20K 16 * iNFAnt on Nvidia Titan Xp ** Nmcart, 4 threads on i5-4440@3.1 GHz
Performance Results Minimum Hardware Fan-out Achieved 40 18 17 21 8 62 12 29 60 8 42 4 12 Through put (MB/s) 20 13 63 63 10 3 15 16 5 24 13 Through put (MB/s) 20 13 63 63 11 3 15 16 5 21 13 12 10 Ave. act. states 14 4 88 240 6331 10 968 31 98 3854 19 # iNFAnt (GPU) * 7 4 38 18 0.5 4 2 53 14 2 5 #R/ buffer 4 5 1 1 5 19 5 5 17 2 6 5 40 Hyperscan (CPU) ** 1 14 1 10 0.1 1 0.5 10 0.4 1 1 Benchmark Brill ClamAV Levenshtein Hamming SPM EntityResolution RandomForest PowerEN Snort Fermi Protomata DotStar states 26668 49538 2784 11346 100500 95136 75340 40513 69029 40783 42061 96438 Overlay 8K 12K 12K 12K 20K 4K 16K 8K 4K 20K 8K 20K 3 Speedup 3 0.9 1.7 3.5 20 0.8 7.5 0.3 0.4 12 2.6 0.3 17 * iNFAnt on Nvidia Titan Xp ** Nmcart, 4 threads on i5-4440@3.1 GHz
Current/Future Work Hide input buffer flush latency with output buffer flush SAT-solver based mapping algorithm Scale up to larger FPGAs and faster memory (DDR4/HBM2)