Exploring Two Erasure Codes in HDFS

scribe a tale of two erasure codes in hdfs n.w

1 / 6

Embed Share

"Discover how HACFS optimizes for recovery and storage overhead with Fast and Compact Erasure Codes, improving performance in HDFS. Learn about its dynamic adaptation, pros and cons, and potential challenges for large-scale object stores in the cloud."

syow747 Follow

Uploaded on Jun 03, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Scribe: A Tale of Two Erasure Codes in HDFS By Ashutosh Bhardwaj

Summary Single Erasure Systems either optimizes for recovery or Storage overhead. HACFS succeeds in improving the recovery performance. Uses two different Erasure Codes, Fast and Compact. Uses Upcoding and downcoding. Implemented for two code families -Product and LRC codes. Improves the performance from single coded systems.

Pros Dynamically adapts with Workload Changes. Fast degraded reads to reduce latency of reads. Low Reconstruction time to reduce the time for recovering from failed nodes. Low Storage overhead which is bounded under practical system constraints. Exploits the data access skew of HDFS.

Cons HACFS success is based on the assumption workload is skewed . HACFS with LRC is generally worse for all workloads compared to RS(6,3). Experiment was done on a cluster of 11 nodes, so scalability might be an issue. Dynamic encoding of hot files is expensive.

My Opinion Paper claims system is 3-failure tolerant Reliability decreases with increase in the size of the matrix.

Questions Is storage really a problem? System Adaptation to large scale object stores in the cloud? What if the workload pattern changes often between fast and compact? How does system handle failure of machine storing cold data encoded with compact codes? How does it handle the data center failure?

Exploring Two Erasure Codes in HDFS

Download Presentation

Presentation Transcript

Related

More Related Content