Self-Tuning Compression Algorithm - gFPC

gfpc a self tuning compression algorithm n.w
1 / 14
Embed
Share

Explore gFPC, a self-tuning compression algorithm optimizing parameters for IEEE 754 double-precision data streams. Automatic, online, and genetic-algorithm-based approach for achieving higher compression ratios. Learn about the unique contribution and the innovative hash function parameters used in this algorithm.

  • Compression
  • Algorithm
  • gFPC
  • Optimization
  • Parameters

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. gFPC: A Self-Tuning Compression Algorithm Martin Burtscher1and Paruj Ratanaworabhan2 1The University of Texas at Austin 2Kasetsart University

  2. Introduction Many compression algorithms are parameterizable Some parameters allow straightforward trade-offs E.g., compression ratio vs. speed Controlled via command line Other parameters provide no obvious trade-off Best value is input dependent and changes dynamically E.g., hash function in a predictor Typically hardcoded gFPC: A Self-Tuning Compression Algorithm 2

  3. Contribution Self-tuning approach to optimize parameters Automatic, on-line, and genetic-algorithm-based Slower compression but higher compression ratio gFPC algorithm for IEEE 754 double-precision data Compresses linear streams of FP values Lossless single-pass algorithm Repeatedly self-tunes 4 hash-table parameters gFPC: A Self-Tuning Compression Algorithm 3

  4. FPC Algorithm [DCC07] uncompressed 1D stream of doubles double . . . . . . 3f82 3b1e 0e32 f39d 64 Make two predictions Select closer value XOR with true value Count leading zero bytes Encode value Update predictors FCM DFCM 64 64 3f82 4 3f51 9 compare compare selector predictor code 1 closer value 64 XOR leading zero byte counter encoder 1+3 0 to 8 bytes remainderb 7129 889b 0e5d bita cnta x y bitb cntb 0 2 remaindera z compressed stream . . . . . . gFPC: A Self-Tuning Compression Algorithm 4

  5. Hash Function Parameters Two predictors FCM predicts values, DFCM predicts differences fcm_prediction = fcm[fcm_hash]; // prediction: read hash table entry fcm[fcm_hash] = true_value; // update: write hash table entry fcm_hash = ((fcm_hash << lshift) ^ (true_value >> rshift)) & (table_size 1); Two parameters each lshift for aging rshift for eliminating random bits 802,816 possibilities with 256 kB table_size gFPC: A Self-Tuning Compression Algorithm 5

  6. Genetic Self-Tuning Compress blocks with several sets of parameters Start with FPC and otherwise random sets population compressed block set1 data block set2 output shortest . . . . . . block size setn population size parent 1 parent 2 Create new sets for next data block Keep best set of parameters Evolve remaining sets cross over mutation child gFPC: A Self-Tuning Compression Algorithm 6

  7. Related Work Genetic algorithms (GAs) for evolving programs Program output approximates original data GAs for evolving compressor parameters off-line Rate distortion Vector quantization Fractal codes Dictionary n-grams Best compressor for each block We use on-line GA: faster, adapts dynamically gFPC: A Self-Tuning Compression Algorithm 7

  8. Evaluation Method System Sun Fire X2270 Server, Ubuntu Linux 8.06 2.93 GHz 64-bit Intel Xeon 5570 (Nehalem) processor Datasets Linear streams of real-world data (18 277 MB) 4 observations: error, info, spitzer, temp 4 simulations: brain, comet, control, plasma 5 MPI messages: bt, lu, sp, sppm, sweep3d gFPC: A Self-Tuning Compression Algorithm 8

  9. Population Size Affects Compression speed Compression ratio 1.55 h-mean compression ratio 1.50 8 kB 256 kB 8 MB 1.45 1.40 Result Population size of 4 performs within .5% of maximum (P. size = 1 FPC) 1.35 1.30 1.25 1.20 1 3 5 7 9 11 13 15 17 19 population size gFPC: A Self-Tuning Compression Algorithm 9

  10. Block Size Affects Reconfiguration frequency Compression ratio 1.55 h-mean compression ratio 1.50 1.45 8 kB 256 kB 8 MB 1.40 1.35 1.30 Result 512 kB blocks good Medium sizes best Warm-up versus adaptivity tradeoff 1.25 1.20 4 kB 16 64 kB 256 kB 1 4 16 MB 64 MB 256 MB kB MB MB block size gFPC: A Self-Tuning Compression Algorithm 10

  11. Compression Ratio Comparison FPCsize and FPCall Use off-line GA an LS to find best parameters for each size (and input) Results FPC is 5% worse FPCsize no input adaptivity FPCall (mostly) better gFPC is retroactive (but can adapt on-the-fly) gFPC is 317 times faster 1.55 h-mean compression ratio FPC gFPC FPCsize FPCall 1.50 1.45 1.40 1.35 1.30 1.25 1.20 1 MB 2 MB 4 MB 8 MB 128 kB 256 kB 512 kB 64 kB 16 kB 32 kB 8 kB predictor size gFPC: A Self-Tuning Compression Algorithm 11

  12. Self-Tuning Benefit Rarely worse, mostly better (up to 72%) Relative to FPC, which was tuned for these inputs Benefit is likely higher on other inputs 1.20 compression ratio improvement 1.15 8 kB 32 kB 1.10 128 kB 1.05 512 kB 1.00 2 MB 8 MB 0.95 0.90 sweep3d temp spitzer plasma lu sp error comet sppm control bt info brain gFPC: A Self-Tuning Compression Algorithm 12

  13. Throughput on Xeon System Compression is slower with larger population size Small compression overhead due to self tuning Decompression is faster due to better compression 8 h-mean throughput (Gb/s) 8 compression decompression h-mean throughput (Gb/s) FPC gFPC4 gFPC1 7 7 6 6 5 5 FPC gFPC4 gFPC1 4 4 3 3 2 2 1 1 0 0 8 kB 1 MB 4 MB 8 MB 2 MB 128 kB 256 kB 512 kB 16 kB 64 kB 32 kB 8 kB 1 MB 4 MB 8 MB 2 MB 128 kB 256 kB 512 kB 16 kB 64 kB 32 kB predictor size predictor size gFPC: A Self-Tuning Compression Algorithm 13

  14. Summary Self-tuning approach Based on on-line genetic algorithm Repeatedly tunes 4 hash-table parameters in gFPC Applicable to other compressors Results Higher compression ratio, lower compression speed gFPC compresses at 1 Gb/s, decompresses at 7 Gb/s C source code of gFPC is freely available http://users.ices.utexas.edu/~burtscher/research/gFPC/ gFPC: A Self-Tuning Compression Algorithm 14

Related


More Related Content