Finite Field Inversion Implementation and Methods

implementation of n.w
1 / 43
Embed
Share

Learn about the implementation of finite field inversion, including the Itoh-Tsujii method, squaring operations, advantages of quad operations, generalization of algorithms, theorems, and more. Explore how to perform operations in fields like GF(2^4) with practical examples and detailed explanations.

  • Finite Field
  • Inversion
  • Methods
  • Algorithms
  • Implementation

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. IMPLEMENTATION OF FINITE FIELD INVERSION Debdeep Mukhopadhyay Chester Rebeiro Dept. of Computer Science and Engineering Indian Institute of Technology Kharagpur INDIA

  2. Finite Field Inverse 23-27 May 2011 Anurag Labs, DRD0 2

  3. Itoh-Tsujii Method for Binary Fields 23-27 May 2011 Anurag Labs, DRD0 3

  4. The Steps 23-27 May 2011 Anurag Labs, DRD0 4

  5. How do we do a Squaring Consider (again) the field GF(24), with irreducible polynomial x4+x+1. What is (x3+x2+1)2 in this field ? 23-27 May 2011 Anurag Labs, DRD0 5

  6. Squaring Squaring can be represented in the form of a matrix multiplication T.a 23-27 May 2011 Anurag Labs, DRD0 6

  7. Quad Operation Quad operation can be done by two squaring operations. Quad operation can be written in the form T2.a 23-27 May 2011 Anurag Labs, DRD0 7

  8. Advantage of using Quad Operations Quad circuits have better LUT utilization compared to Squarer circuits 23-27 May 2011 Anurag Labs, DRD0 8

  9. Generalization of the Itoh-Tsujii Algorithm 23-27 May 2011 Anurag Labs, DRD0 9

  10. Theorem 1 23-27 May 2011 Anurag Labs, DRD0 10

  11. Theorem 2 23-27 May 2011 Anurag Labs, DRD0 11

  12. Quad Itoh-Tsujii Inversion Algorithm 23-27 May 2011 Anurag Labs, DRD0 12

  13. A Circuit for Inversion At every clock cycle, either the multiplier or the quadblock is active. The output of the multiplier is stored in mout register 23-27 May 2011 Anurag Labs, DRD0 13

  14. Finding the Inverse 23-27 May 2011 Anurag Labs, DRD0 14

  15. Finding the Inverse Step 2 23-27 May 2011 Anurag Labs, DRD0 15

  16. Finding the Inverse Step 2 23-27 May 2011 Anurag Labs, DRD0 16

  17. Control Signals for the Inverse 23-27 May 2011 Anurag Labs, DRD0 17

  18. Performance Charts 23-27 May 2011 Anurag Labs, DRD0 18

  19. Higher Powered Itoh-Tsujii We seen that Quad circuits utilize LUTs in a better way compared to squarer circuits. Also LUT size is increasing as silicon technology reduces We have seen 4-LUT become 6-LUT, and now 8-LUT This gives us a motivation to investigate using higher powers other than quad circuits 23-27 May 2011 Anurag Labs, DRD0 19

  20. Revisiting the Theorems 23-27 May 2011 Anurag Labs, DRD0 20

  21. 2n Itoh-Tsujii Inversion These are the overheads Higher Powered 23-27 May 2011 Anurag Labs, DRD0 21

  22. Overhead in 2n Itoh-Tsujii Computation of . Using addition chain for , can be computed in clock cycles, where is the length of addition chain for . Computation of , for Using addition chain for , that contains , can be computed during computation, because . 23-27 May 2011 Anurag Labs, DRD0 22

  23. 2n Itoh-Tsujii Design 23-27 May 2011 Anurag Labs, DRD0 23

  24. Building the Optimal Design For a given field and a given FPGA how do decide the optimal design ? Configurable Parameters Addition chain. Power circuit used in power block. Number of cascaded power circuits in the power block. These have an effect on Number of clock cycles. Critical path delay. 23-27 May 2011 Anurag Labs, DRD0 24

  25. Estimating AREA required on an FPGA A k input LUT (k-LUT) can implement any functionality of maximum k input variables. Total number of k-LUTs to implement a function with variables can be expressed as 23-27 May 2011 Anurag Labs, DRD0 25

  26. Estimating Delay of a Design in an FPGA Delay in FPGAs comprise of LUT delay and routing delay.. For this ITA architecture, we have experimentally found, total delay is proportional to number of LUTs in critical path. We denote number of LUTs in a delay path as maxlutpath. In k-LUT, maxlutpath of an variable function is 23-27 May 2011 Anurag Labs, DRD0 26

  27. Recap : Karatsuba Multiplier 23-27 May 2011 Anurag Labs, DRD0 27

  28. Hybrid Karatsuba Multiplier for GF(2233) Note that the school book multiplier has replaced the general Karatsuba Multiplier School Book Multiplier 23-27 May 2011 Anurag Labs, DRD0 28

  29. Estimating LUT Requirement for Hybrid Karatsuba Multiplier The field multiplier is a hybrid Karatsuba multiplier. A bit hybrid Karatsuba multiplier consists of two bit and one bit multipliers. This happens in recursive manner. In threshold ( ) level, School-Book multiplier is invoked. Total area of bit hybrid Karatsuba multiplier is given by Total area for the School-Book multiplier is 23-27 May 2011 Anurag Labs, DRD0 29

  30. Estimating Delay of Hybrid Karatsuba Multiplier The hybrid Karatsuba multiplier is distributed in smaller multipliers like a tree. Height of the tree is Each level of the Simple Karatsuba tree introduces one LUT delay. In threshold ( ) level, School-Book multiplier delay is added. Delay of School-Book multiplier is Delay of the entire multiplier in LUTs is given by 23-27 May 2011 Anurag Labs, DRD0 30

  31. Estimating Area & Delay for Modular Reduction For fields generated by trinomials, area of modular reduction is almost equal to field size and delay is one LUT considering LUT size . For fields generated by pentanomials, and 2 LUT for . and 2 LUT for . 23-27 May 2011 Anurag Labs, DRD0 31

  32. Area & Delay Estimates for 2n Circuit The output of a 2n circuit, which raises an input can be expressed as , where is binary field matrix and , LUT requirement per output bit is Total LUT requirement for the 2n circuit is LUT delay per output bit is Since all bits are in parallel, delay of 2n circuit is 23-27 May 2011 Anurag Labs, DRD0 32

  33. Area & Delay Estimates for Multiplexer For a 2s : 1 MUX, there are s selection lines and thus the output is a function of 2s + s variables. For a MUX in , each of the 2s input lines is of width m bits. Total LUT requirement is Total LUT delay of the MUX is When number of inputs to MUX , the above gives a close upper bound 23-27 May 2011 Anurag Labs, DRD0 33

  34. Area & Delay of PowerBlock Let the Powerblock contains us number of cascaded 2n circuits. The has selection lines, where LUT requirement for is Total LUT requirement for Powerblock is Delay of is Total LUT delay of Powerblock in 23-27 May 2011 Anurag Labs, DRD0 34

  35. Area & Delay for the Entire Architecture LUT estimate for the entire architecture is There are two parallel delay paths. LUT delay of first path is LUT delay of second path is LUT delay of entire architecture is 23-27 May 2011 Anurag Labs, DRD0 35

  36. Optimal Number of Cascades For a given field and based FPGA, Powerblock can be configured with different power circuits and cascades . Increase in reduces clock cycles, but increases delay of Powerblock. is fixed, but depends on and . is minimum when Minimum delay of the ITA architecture is thus 23-27 May 2011 Anurag Labs, DRD0 36

  37. Power Circuit Selection to achieve Minimum Clock Cycles Number of clock cycles for the inversion can be approximated as Number of clock cycles for increases linearly with . The term reduces with increase in . When is small, the reduction in is significant for increase in . But, for large values of n, the increase in dominates over the decrease in So, increases with increase in for large values of . 23-27 May 2011 Anurag Labs, DRD0 37

  38. Tuning Design for Optimality The performance metric is Minimization of without increasing gives best performance. Area remains almost same. The following steps are performed to achieve optimal performance The optimal architecture is given by 23-27 May 2011 Anurag Labs, DRD0 38

  39. Validation of Theoretical Estimates Our estimation model uses maxlutpath to find LUT delay. Routing delay is difficult to model in FPGAs. To get overall delay, we have used experimental results for a reference ITA architecture. Total delay of reference architecture is the Let LUT delay of reference architecture is Total delay of any other ITA architecture in the same field is approximately Here is a constant and depends on FPGA technology. In 4-LUT based and 6-LUT based Xilinx FPGAs, has values 0.2 and 0.1 respectively. 23-27 May 2011 Anurag Labs, DRD0 39

  40. Validation on 4-input LUT FPGAs 23-27 May 2011 Anurag Labs, DRD0 40

  41. Validation on 6-input LUT FPGAs 23-27 May 2011 Anurag Labs, DRD0 41

  42. Experimental Results 23-27 May 2011 Anurag Labs, DRD0 42

  43. Comparison Charts 23-27 May 2011 Anurag Labs, DRD0 43

Related


More Related Content