Modular SRAM-Based Binary Content-Addressable Memories Overview

modular sram based binary content addressable n.w
1 / 42
Embed
Share

"Explore the world of Modular SRAM-based Binary Content-Addressable Memories (BCAM) and their applications in memory management, pattern matching, data processing, and more. Learn about the motivation, objectives, algorithmic heuristics, and register-based BCAM architecture, offering a comprehensive understanding of this technology."

  • Binary Memory
  • Modular SRAM
  • Content-Addressable
  • BCAM Applications
  • Memory Management

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Modular SRAM-based Binary Content-Addressable Memories Ameer M.S. Abdelhadi and Guy G.F. Lemieux Department of Electrical and Computer Engineering University of British Columbia a place of mind THE UNIVERSITY OF BRITISH COLUMBIA Vancouver, Canada

  2. Binary Content-Addressable Memory (BCAM) Hardware-based Single-Cycle Parallel Search Engines Write Stores new data at specific address B C D A 3 BCAM Match Search all addresses for a given data (pattern) B C D A 3 BCAM 0 1 2 0 1 2 Store C in 1 Search for D Found in 2 1/17

  3. BCAM Applications Memory management Pattern matching Data Data Databases Networking compression encryption IP lookup in routing/ forwarding tables Associative caches find and shorten redundant patterns find and encrypt specific patterns Intrusion detection Eliminates memory bottleneck e.g. DNA sequence lookup detect predefined suspicious packages Translation lookaside buffers (TLBs) Packet Classification 2/17

  4. Motivation - FPGAs 1000 s Memory Blocks 100,000 s Logic Elements No dedicated BCAM resources in FPGAs 3/17

  5. Objectives BCAMs FPGAs Massively parallel memory search Block RAMs are main storage Limited memory bandwidth Require high memory bandwidth Modular and flexible Storage efficient Use BRAMs to construct Single-cycle Performance oriented BCAMs 4/17

  6. Algorithmic Heuristics Associative Arrays Search Trees: Tries, BSTs, Hashes Data dependent performance Variable search depth Misses due to conflicts Multi-unpredictable-cycle 5/17

  7. Register-Based BCAM Concurrent register read and compare PW En D En D Match Reg0 = Priority Encoder Q Decoder Address log2CD PW log2CD MIndc Reg1 = Q WAddr CD MAddr PW En D RegCD-1 = Q PW PW WPatt MPatt Single-cycle Limited resources Complex routing Fits small BCAMs 6/17

  8. Brute-Force Transposed-Indicators-RAM (1) A Traditional BRAM-based BCAM Key idea: Transposed RAM - data becomes addresses Write Write 0 to location B 3 0 1 2 D Match Read location D for match A B C D D A B C 3 0 1 2 0 to B 2 * Xilinx App Notes 7/17

  9. Brute-Force Transposed-Indicators-RAM (2) Storing Data to Multiple Addresses How can we store data to multiple addresses? Specify addresses using one-hot coding Each bit indicates a match or store at location PROBLEM: Depth of CAM is limited by data width of RAM e.g. to build 1M deep CAM, we need 1M bits wide In FPGAs: 1000 BRAMs x 32bit wide = 32K deep CAM 0 1 2 3 A B C D 1 BRAM-based Single-cycle Depth of CAM is limited by RAM width 8/17

  10. BCAM Cascading M a t c h e d P a t t e r n PROBLEM: Patterns are encoded as RAM addresses RAM depth is exponential to pattern width RAM Depth = 2Pattern Width Slice Slice Slice CAM CAM CAM Slice Match Slice Match Slice Match Solution: Cascading 1. Divide pattern into smaller slices 2. Search for each slice separately 3. If all slices are found pattern match! RAM depth is linear to pattern width RAM Depth = 2Slice Width x (Pattern Width / Slice Width) Pattern Match 9/17

  11. Hierarchical Search 2D BCAM (1) Narrow and Deep BCAM Key idea: Hierarchical search 1D BCAM Find a set (row) with match using a 1D BCAM 2D BCAM Search this set (row) in parallel for a specific match 2k 4M too deep 2k 10/17

  12. Hierarchical Search 2D BCAM (2) Example addresses Divide address space into segments RAM: each segment in a line Transposed-RAM: indicates pattern in segment? Hierarchical Search: 1. Find a row (segment) with match using a 1D BCAM 2. Search this row (segment) in parallel for a specific match 3addressespatterns RAM 0 1 2 3 0 1 2 2 3 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 Transposed-RAM 0 1 2 3 patterns 11/17

  13. Hierarchical Search 2D BCAM (2) Example addresses Divide address space into sets RAM: each segment in a line Transposed-RAM: indicates pattern in segment? Hierarchical Search: 1. Find a row (segment) with match using a 1D BCAM 2. Search this row (segment) in parallel for a specific match 3addressespatterns RAM 0 1 2 3 0 1 2 2 3 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 Transposed-RAM 0 1 2 3 patterns 11/176

  14. Hierarchical Search 2D BCAM (2) Example addresses Divide address space into sets RAM: each set in a line Transposed-RAM: indicates pattern in segment? Hierarchical Search: 1. Find a row (segment) with match using a 1D BCAM 2. Search this row (segment) in parallel for a specific match 1setspatterns 0 1 2 3 0 2 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 Transposed-RAM 3 1 0 1 2 3 patterns RAM 11/17

  15. Hierarchical Search 2D BCAM (2) Example sets Divide address space into sets RAM: each set in a line Transposed-RAM: indicates pattern in set? Hierarchical Search: 1. Find a row (segment) with match using a 1D BCAM 2. Search this row (segment) in parallel for a specific match patterns 0 1 0 0 0 1 1 1 0 1 0 0 1 2 1 3 1 0 1 2 3 patterns Transposed-RAM sets RAM 11/17

  16. Hierarchical Search 2D BCAM (2) Example sets Divide address space into sets RAM: each set in a line Transposed-RAM: indicates pattern in set? Hierarchical Search: 1. Find a set (row) with match using a 1D BCAM 2. Search this row (segment) in parallel for a specific match patterns 0 1 0 0 0 1 1 1 0 1 0 0 1 2 1 3 1 0 1 2 3patterns Transposed-RAM sets RAM Match pattern 3 11/17

  17. Hierarchical Search 2D BCAM (2) Example sets Divide address space into sets RAM: each set in a line Transposed-RAM: indicates pattern in set? Hierarchical Search: 1. Find a set (row) with match using a 1D BCAM 2. Search this set (row) in parallel for a specific match patterns 0 1 0 0 0 1 1 1 0 1 0 0 1 2 1 3 1 0 1 2 3patterns Transposed-RAM sets RAM 11/17

  18. Hierarchical Search 2D BCAM (3) Pros and Cons Efficient for deep CAMs BRAM-Based Single-cycle RAM depth is exponential to pattern width Single match only Cannot be cascaded Inefficient for wide patterns 12/17

  19. Indirectly-Indexed 2D (II2D) BCAM (1) Cascadable Wide and Deep BCAM PROBLEM: is it possible to regenerate matches for all addresses? patterns addresses 1 1 1 1 1 1 1 1 13/17

  20. Indirectly-Indexed 2D (II2D) BCAM (1) Cascadable Wide and Deep BCAM PROBLEM: is it possible to regenerate matches for all addresses? patterns addresses 1 Key observation n columns (set of addresses) accommodates nmatches (1 s) at most! 1 Transposed RAM is a sparse matrix 1 1 1 1 1 1 13/17

  21. Indirectly-Indexed 2D (II2D) BCAM (1) Cascadable Wide and Deep BCAM PROBLEM: is it possible to regenerate matches for all addresses? Sets Key observation n columns (set of addresses) accommodates nmatches (1 s) at most! patterns Transposed RAM is a sparse matrix Key idea: use indirect indices to point to intra-set matches Cascadable Scalable (linear growth) Supports wider patterns 11 1 11 1 1 1 Intra-set Match Indicators 13/17

  22. Indirectly-Indexed 2D (II2D) BCAM (1) Cascadable Wide and Deep BCAM PROBLEM: is it possible to regenerate matches for all addresses? Sets Key observation n columns (set of addresses) accommodates nmatches (1 s) at most! patterns Transposed RAM is a sparse matrix Key idea: use indirect indices to point to intra-set matches Cascadable Scalable (linear growth) Supports wider patterns 11 1 11 1 1 1 Intra-set Match Indicators LUTRAM BRAM 13/17

  23. Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM Transposed-RAM RAM (reference) addresses patterns 0 1 2 3 addresses 0 0 0 0 0 0 0 1 0 1 2 3 0 1 2 3 2 3 3 1 patterns 1 0 0 1 0 0 1 0 14/17

  24. Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM Transposed-RAM RAM (reference) addresses patterns 0 1 2 3 addresses 0 0 0 0 0 0 0 1 0 1 2 3 0 1 2 3 2 3 3 1 patterns 1 0 0 1 0 0 1 0 14/17

  25. Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM Transposed-RAM RAM (reference) addresses patterns 0 1 2 3 addresses 0 0 0 0 0 0 0 1 2 3 0 1 2 3 2 3 3 1 patterns 0 0 1 0 0 1 0 1 1 0 Indicators-RAMs 14/17

  26. Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM Transposed-RAM RAM (reference) sets patterns 1 0 addresses - - - 0 1 2 3 0 1 2 3 2 3 3 1 patterns - 1 0 0 1 0 1 1 0 Indicators-RAMs 14/17

  27. Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM Transposed-RAM RAM (reference) sets patterns 1 0 addresses Match pattern 3 - - - 0 1 2 3 0 1 2 3 2 3 3 1 patterns - 1 0 0 1 0 1 1 0 Indicators-RAMs 14/16

  28. Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM Transposed-RAM RAM (reference) sets patterns 1 0 addresses Match pattern 3 - - - 0 1 2 3 0 1 2 3 2 3 3 1 patterns - 1 0 0 1 0 1 1 0 Indicators-RAMs 14/17

  29. Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM Transposed-RAM RAM (reference) sets patterns 1 0 addresses Match pattern 3 0 1 2 3 - - - 0 1 2 3 0 1 2 3 2 3 3 1 patterns - 1 0 0 1 0 1 1 0 0 1 1 0 Indicators-RAMs Found in 1 and 2 14/17

  30. Indirectly-Indexed 2D (II2D) BCAM (3) Area and Performance Except for very a narrow HS, II2D exhibits higher Fmax 500 Reg-based BF-BCAM HS-BCAM II2D-BCAM Device Limit 400 Fmax (MHz) 300 200 register-based BCAM register consumption 100 0 350 300 ALMs (1000's) 250 200 II2D linear ALM consumption; similar to other methods 150 100 50 0 3 M20Ks (1000's) HS exponential BRAM consumption 2.5 2 1.5 1 II2D linear BRAM consumption 0.5 0 81 9 18 9 18 27 36 45 54 63 72 90 99 18 27 36 45 54 63 72 9 27 36 9 108 117 126 135 144 153 PW II2D supports wider patterns CD 16K 32k 64k 128k 15/17

  31. Indirectly-Indexed 2D (II2D) BCAM (3) Area and Performance Except for very a narrow HS, II2D exhibits higher Fmax 500 Reg-based BF-BCAM HS-BCAM II2D-BCAM Device Limit 400 Fmax (MHz) 300 200 register-based BCAM register consumption 100 0 350 300 ALMs (1000's) 250 200 II2D linear ALM consumption; similar to other methods 150 100 50 0 3 M20Ks (1000's) HS exponential BRAM consumption 2.5 2 1.5 1 II2D linear BRAM consumption 0.5 0 81 9 18 9 18 27 36 45 54 63 72 90 99 18 27 36 45 54 63 72 9 27 36 9 108 117 126 135 144 153 PW II2D supports wider patterns CD 16K 32k 64k 128k 15/17

  32. Indirectly-Indexed 2D (II2D) BCAM (3) Area and Performance Except for very a narrow HS, II2D exhibits higher Fmax 500 Reg-based BF-BCAM HS-BCAM II2D-BCAM Device Limit 400 Fmax (MHz) 300 200 register-based BCAM register consumption 100 0 350 300 ALMs (1000's) 250 200 II2D linear ALM consumption; similar to other methods 150 100 50 0 3 M20Ks (1000's) HS exponential BRAM consumption 2.5 2 1.5 1 II2D linear BRAM consumption 0.5 0 81 9 18 9 18 27 36 45 54 63 72 90 99 18 27 36 45 54 63 72 9 27 36 9 108 117 126 135 144 153 PW II2D supports wider patterns CD 16K 32k 64k 128k 15/17

  33. Indirectly-Indexed 2D (II2D) BCAM (3) Area and Performance Except for very a narrow HS, II2D exhibits higher Fmax 500 Reg-based BF-BCAM HS-BCAM II2D-BCAM Device Limit 400 Fmax (MHz) 300 200 register-based BCAM register consumption 100 0 350 300 ALMs (1000's) 250 200 II2D linear ALM consumption; similar to other methods 150 100 50 0 3 M20Ks (1000's) HS exponential BRAM consumption 2.5 2 1.5 1 II2D linear BRAM consumption 0.5 0 81 9 18 9 18 27 36 45 54 63 72 90 99 18 27 36 45 54 63 72 9 27 36 9 108 117 126 135 144 153 PW II2D supports wider patterns CD 16K 32k 64k 128k 15/17

  34. Indirectly-Indexed 2D (II2D) BCAM (3) Area and Performance Except for very a narrow HS, II2D exhibits higher Fmax 500 Reg-based BF-BCAM HS-BCAM II2D-BCAM Device Limit 400 Fmax (MHz) 300 200 register-based BCAM register consumption 100 0 350 300 ALMs (1000's) 250 200 II2D linear ALM consumption; similar to other methods 150 100 50 0 3 M20Ks (1000's) HS exponential BRAM consumption 2.5 2 1.5 1 II2D linear BRAM consumption 0.5 0 81 9 18 9 18 27 36 45 54 63 72 90 99 18 27 36 45 54 63 72 9 27 36 9 108 117 126 135 144 153 PW II2D supports wider patterns CD 16K 32k 64k 128k 15/17

  35. Indirectly-Indexed 2D (II2D) BCAM (3) Area and Performance Except for very a narrow HS, II2D exhibits higher Fmax 500 Reg-based BF-BCAM HS-BCAM II2D-BCAM Device Limit 400 Fmax (MHz) 300 200 register-based BCAM register consumption 100 0 350 300 ALMs (1000's) 250 200 II2D linear ALM consumption; similar to other methods 150 100 50 0 3 M20Ks (1000's) HS exponential BRAM consumption 2.5 2 1.5 1 II2D linear BRAM consumption 0.5 0 81 9 18 9 18 27 36 45 54 63 72 90 99 18 27 36 45 54 63 72 9 27 36 9 108 117 126 135 144 153 PW II2D supports wider patterns CD 16K 32k 64k 128k 15/17

  36. Open Source http://ece.ubc.ca/~lemieux/downloads/ Run-in-batch simulation and synthesis manager Modular and parametric Verilog files 16/17

  37. Conclusions Brute-Force Transposed-RAM addresses BRAM-based Single-cycle patterns Cascadable Scalable Deep Wide 17/17

  38. Conclusions Hierarchical Search 2D BCAM sets BRAM-based Single-cycle patterns Cascadable Scalable Deep Wide 17/17

  39. Conclusions Indirectly-Indexed 2D (II2D) BCAM sets BRAM-based Single-cycle patterns Multi-unpredictable-cycle Cascadable Scalable Deep Wide Intra-set Match Indicators 17/17

  40. Thank You!

  41. Backup Slides

  42. Conclusions Hierarchical patterns s e t s II2D Brute-Force patterns addresses patterns s e t s Match Indicators BRAM-based Single-cycle Deep Wide 16

Related


More Related Content