
Enhancing Data-intensive Applications through Resource Utilization Optimization
Explore how underutilized resources are harnessed to enhance responsiveness and fault tolerance in data-intensive applications. The research delves into redundancy, data replication in Hadoop environments, crash fault tolerance, and more.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Harvesting Underutilized Resources to Improve Responsiveness and Tolerance to Crash and Silent Faults for Data-intensive Applications Debashis Ganguly, Mohammad H. Mofrad, Taieb Znati, Rami Melhem, John R. Lange Department of Computer Science, University of Pittsburgh
outline Background and Motivation Redundancy TO tolerate crash faults AND data silent errors Data replication in Hadoop environments Interweaved data replication Prototype Implementation in Hadoop Experimental results Conclusion and future work
Motivation Emergence of new time-sensitive applications, data- intensive real-time analytics, and new technology trends like internet of things Low latency is critical. Fundamental need of timely detection and correction of errors Multifold increase in complexity of the computing infrastructure Crash faults are more common. Undetectable silent data corruption or soft errors are becoming more prevalent.
Limitations of Existing Approaches Inability to deal with crash faults and silent data errors uniformly Mostly rely on time-redundancy to deal with crash faults Speculatively launch back-up replicas to mitigate stragglers and/or recover from crash faults Cannot guarantee agreed upon response latency in the presence of faults.
HDFS Cluster 3-Way Data Replication (5 Nodes) File A File D File B File C Block1 Block 1 Block1 Block1 Block2 Block 2 Block2 Block2 Block3 A1 A2 A1 A2 A1 A2 B1 B2 B1 B2 C1 C2 C3 C1 C2 C3 C1 C2 C3 D1 D2 D1 D2 B1 B2 D1 D2 Node 4 Node 2 Node 3 Node 1 Node 0 Locator Table File A, B and C 0 Can replicate processing where the files are replicated File and C 1 File B and D 2 File A, C and D 3 File B and D 4
Task Replication Replicate the task taking advantage of file replicas Can tolerate a crash fault Can correct a silent error with three replicas voting
Data-interweaved Replication R Number of replicas per task ? = ??+ ???+ ? Where, ??: Number of crash faults ??: Number of silent data errors
Data-interweaved Replication Do not duplicate work unless a fault is detected In case of a crash fault Two phases of execution Finish after first phase in the absence of faults
Tolerating two crash faults May triplicate execution to tolerate up to two crash faults
Tolerating Silent errors Need to triplicate execution to tolerate up to one silent error Compare the results (some signature) If matching, then done (in 2/3 of the time 2 phases) Else continue execution (phase 3)
Data-interweaved Replication ? = ??+ ???+ ? Where, ??: Number of crash faults ??: Number of silent data errors ? : Replication factor R Number of logical phases in the life-cycle of each replica to consume the data split R Number of logical sub-partitions of the input data split R Replication factor of underlying distributed file system Guarantees distinct data location for all replicas to run on in parallel since the beginning of the processing.
Interleaved data execution to tolerate Crash Faults Response Time Reduced by 50% Taskm Replica 1 Taskm Replica 2 Phase 1 1 2 1 2 ? 2 Combine partial outputs of replicas and complete Commit_1_1 ============================= Commit_2_2 One of the replicas is in fault? No Phase 2 1 2 Replica 2 is in fault ? 2 Yes Commit_1_2 Continue with Phase 2 Node i Node j
Interleaved Data Execution to tolerate Silent Data Errors Taskm Replica? 1 Taskm Replica? 2 Taskm Replica? 3 Response Time Reduced by 33% Phase? 1 ? 3 1 2 3 1 2 3 1 2 3 Commit_1_1 Hash? of? Commit_1_1 Commit_2_2 Commit_3_3 Hash? of? Commit_1_1 Hash? of? Commit_2_2 ------------------------------------------------------------------- Phase? 2 ? 3 1 2 3 1 2 3 1 2 3 Voting? System Commit_2_3 Hash? of? Commit_2_3 Commit_1_2 Commit_3_1 Hash? of? Commit_3_1 Hash? of? Commit_1_2 Perform? voting? on? received? hashes ===================================== 1 2 3 Phase? 3 ? 3 1 2 3 1 2 3 Choose?one? copy?of? output? per? sub-split? and? complete Majority? consensus? reached? Yes Commit_1_3 Hash? of? Commit_1_3 Commit_2_1 Hash? of? Commit_2_1 Commit_3_2 Hash? of? Commit_3_2 No Continue?with? Phase? 3 Node? i Node? j Node? k
Prototype Extension of Hadoop v1 JobTracker: Maintains information of a running job, as an object of JobInProgress (unmodified). JobInProgress: Modified to initialize and schedule task replicas, based on the configurable parameters ReplicaId field: To uniquely identify a replica of the same task VotingSystem class: Private member of JobInProgress Heartbeat: Modified to carry SHA-256 based signatures generated on partial outputs Used to compare the results generated by different replicas to detect silent data corruption Only the partial outputs and not the signatures are stored on disk
Fault Injector Emulates fault deterministically Crash faults: Emulated by injecting exceptions at the end of task completion Silent data corruption: Emulated by tampering the signatures sent along with heartbeat to the centralized voting system. 15
Fault Injector Configuration parameters specified in mapred-site.xml mapred.map.tasks.fault.tolerance: An integer, specifying number of faults that can be tolerated without aborting the job mapred.map.tasks.fault.nature: An integer, to determine the nature of faults (crash faults or silent data error) Parameter for the fault injection emulator mapred.map.tasks.fault.inject: A Boolean, set when framework emulates fault injection mapred.map.tasks.fault.percent: An integer, the percentage of tasks to be affected by fault emulation
Experimental Framework Experimental testbed is set up on BRIDGES at the Pittsburgh Supercomputing Center (PSC). Each node has 28 Intel(r) XEON(r) CPU E5-2695 v3 @ 2.30 GHz cores and 128 GB RAM Communication is based on GigEthernet. 32KB L1 instruction and data cache, 256KB L2 cache per core and 32MB L3 cache per NUMA socket. OS is centos 7.2.1511 with 3.10.0 Linux kernel.
Benchmarks GridMix2 A combination of synthetic jobs that model the Cloud workloads Selected benchmarks use uncompressed data StreamSort, Combiner and JavaSort Consolidated results across different runs Used split size = 128 MB
Compared our data interleaved replication (DIRFT) with the following: Speculative Execution (SE) of MapReduce Runs a back-up replica on separate node upon detecting/speculating a fault does not tolerate silent errors. Byzantine Fault-tolerance (BFT) scheme1 Launches 2x replicas per task in 1st phase and upon detecting silent data error, launches 1x replica per faulty task in 2nd phase. Full Replication (FRFT) scheme Doesn t interweave the consumption of input data from different offset 1P. COSTA ET AL., Byzantine Fault-tolerant MapReduce Faults Are Not Just Crashes, Cloud Computing Technology and SCIENCE, 2011
Overhead in the absence of (crash) faults JAVA SORT STREAMSORT COMBINER
Overhead in the absence of (silent) faults JAVA SORT STREAMSORT COMBINER
Overhead in the absence of (silent) faults STREAMSORT
Makespan vs. % of Tasks experiencing Crash faults (Stream Sort) 600 200 SE FRFT DIRFT SE FRFT DIRFT 180 500 160 140 Makespan (S) 400 Makespan (S) 120 300 100 80 200 60 40 100 20 0 0 0 12.5 25 37.5 50 62.5 75 87.5 100 0 12.5 25 37.5 50 62.5 75 87.5 100 Percentage of tasks affected by fault emulation Percentage of tasks affected by fault emulation 100 INPUT SPLIT 300 INPUT SPLIT 900 SE FRFT DIRFT 800 700 Makespan (S) 600 500 400 300 200 100 0 0 12.5 25 37.5 50 62.5 75 87.5 100 Percentage of tasks affected by fault emulation 500 INPUT SPLIT
Makespan vs. % of Tasks experiencing Crash faults (Stream Sort) 300 INPUT SPLITS
Makespan vs. % of Tasks experiencing Silent faults (Stream Sort) 600 BFT FRFT DIRFT 500 Makespan (S) 400 300 200 100 0 0 12.5 25 37.5 50 62.5 75 87.5 100 Percentage of tasks affected by fault emulation 100 INPUT SPLIT 300 INPUT SPLIT 900 BFT FRFT DIRFT 800 700 Makespan (S) 600 500 400 300 200 100 0 0 12.5 25 37.5 50 62.5 75 87.5 100 Percentage of tasks affected by fault emulation 500 INPUT SPLIT
Makespan vs. % of Tasks experiencing Silent faults (Stream Sort) 600 BFT FRFT DIRFT 500 Makespan (S) 400 300 200 100 0 0 12.5 25 37.5 50 62.5 75 87.5 100 Percentage of tasks affected by fault emulation 300 INPUT SPLITS
Conclusion Data-interweaved replication to Increase degree of parallelism in the absence of failure Do redundant work only in case of failure Detect silent errors using two copies and executes a third copy only if an error is detected. Although it is prototyped on MapReduce, it is not limited to a specific implementation. Scheme is applicable in other frameworks, such as Spark and Flink.
THANKS THANKS Q&A