
Systems Design for Big Data Management
"Explore the challenges and strategies in managing big data, research methodologies, Hadoop ecosystem, and system design concepts. Learn from Dr. Weikuan Yu, Associate Professor at Florida State University, as he shares insights on graduate student life, data explosion, research strategies, and Hadoop MapReduce. Dive into the world of big data ecosystem and discover the intricacies of data insight, complexity, and technologies shaping the field."
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University
Life as a Graduate Student Course work Degrees vs. skill sets Research Goals vs. means Career development Presentation and teaching skills Communication and social skills Internships Networking Oct 15, 2015 CS5935 - S2
Big Data Challenge 7910 EXABYTES The dawn of civilization Year 2015 Year 2003 5 EXABYTES of DATA Every 2 Days 5 EXABYTES of DATA Where is Data Coming From ? And so many others 3.0 Billion Internet Users 1.35 Billion Facebook Users 550 Million Tweets Per Day 4.7 Billion Google Search Per Day 72 Hrs Video Per Minutes Sources: 1: http://www.infobarrel.com/Evolution 2:http://visual.ly/big-data-explosion?utm_source=visually_embed Oct 15, 2015 CS5935 - S3
Big Data Ecosystem Data Insight Complexity Mahout, Giraph, Pregel, R Applications Hive, Pig, Shark, Flume Services RamCloud Storm Spark MR/Hadoop Run-Time Hbase HDFS MemCached OS VM, Containers, Public and Private Clouds Hardware Processors (Accelerators) Storage Networks Disks Infrastructure Oct 15, 2015 CS5935 - S4
Research Strategies? Descriptive research Examine and collect facts to document patterns Discovery oriented research Inductive reasoning from patterns to general discoveries Engineering-based research Use existing techniques and theories to create a technology or tool. Hypothesis-driven research Make a hypothesis and then test the hypothesis using deductive reasoning Oct 15, 2015 CS5935 - S5
Overview of Hadoop MapReduce map shuffle/merge reduce DFS DFS MEM Split MapTask ReduceTask MOF Split MapTask ReduceTask MOF ReduceTask . . MOF Split MapTask ReduceTask MOF JobTracker Assign ReduceTasks Assign MapTasks Oct 15, 2015 CS5935 - S6
Small Job Starvation within Hadoop Standalone Execution Time (sec) Normalized Execution Time (slowdown) Standalone Execution Time (sec) 60 3000 52 Normalized Execution Time 50 2500 40 2000 30 1500 20 1000 10 500 1.5 0 0 1 2 3 4 5 6 7 Groups Oct 15, 2015 CS5935 - S7
Hadoop Fair Scheduler The mostly widely used Hadoop scheduler It is designed to provide fairness among concurrently running jobs. Tasks occupy slots until completion or failure J-2 J-3 J-3 Slot-M5 J-2 J-3 J-3 Slot-M4 5 Map Slots J-2 J-3 J-1 Slot-M3 shuffle J-2 J-2 J-1 Slot-M2 reduce Slot-M1 J-2 J-2 J-1 Slot-R3 3 Reduce Slots Slot-R2 Slot-R1 Job Arrival Time Oct 15, 2015 CS5935 - S8
Objective How to achieve both Efficiency and Fairness? How to correct the non-preemptive nature of reduce tasks for flexible and dynamic allocation of reduce slots? Existing schedulers are not aware of such behavior. Once a reduce task is launched, it stays with the reduce slot till the end. How to better schedule two different types of tasks? Hadoop schedules both map and reduce tasks with a similar max-min fair sharing policy without paying attention to their relationship in a job. Map and reduce slots need to be dynamically, and proportionally shared. Oct 15, 2015 CS5935 - S9
Preemptive ReduceTasks Different from Linux command Kill STOP $PID . Lightweight work-conserving preemption mechanism. Provides any-time preemption with negligible performance impact. Allows a reduce task to resume from where it was preempted. Preserves previous computation and I/O. Oct 15, 2015 CS5935 - S10
Preemption During Shuffle Phase Only merge the in-memory segs, while maintaining on-disk segs untouched R1: Before Preempt R1: After Resume Heap Heap seg seg seg seg Merge Segment Segment Retrieve Index TaskTracker Oct 15, 2015 CS5935 - S11
Preemption During Reduce Phase Preemption is occurred at the boundary of intermediate <key, val> pairs. Recording the current offset of each segment and minimum priority queue R1: Before Preempt R1: After Resume MPQ MPQ DFS offset Index TaskTracker Oct 15, 2015 CS5935 - S12
Evaluation of Preemptive ReduceTask 7000 Work-Conserving Preemption Killing Preemption No Preemption (Baseline) Job Execution Time (se) 6000 5000 4000 Negligible overhead 3000 10% 30% 50% 70% 90% Completion Ratio of ReduceTask Oct 15, 2015 CS5935 - S13
Fast Completion Scheduler Strategy: Find a reduce task for Preemption and select another for Launching, and balance the utilization of reduce slots Decisions to make Which reduce task to preempt and for how many times? Which one to launch and on which slot/node? How to avoid starvation and achieve locality? New progress metrics of a job: Remaining shuffle time Remaining data (<k,v> pairs) to be reduced Oct 15, 2015 CS5935 - S14
FCS Algorithms Preemption algorithm Select a reduce task from a job with longer remaining time + more remaining data Task slackness: the number of times a task has been preempted Avoid starvation: do not preempt a reduce task that has a big slackness. Launching algorithm Select another reduce task from a job with the least remaining time / remaining data Delay a reduce task to maximize data locality. Avoid aggressive delay: set a threshold based on the cluster size. Oct 15, 2015 CS5935 - S15
Results for Map-heavy Workload FCS reduces average execution time by 31% (171 jobs). Significantly speeds up small jobs at a small cost of big jobs. 10000 FCS HFS Average Execution Time (sec) 1.1 2.2 0.79 1000 1.6 1.9 100 2.3 1.9 1.9 2.4 10 1 1 2 3 4 5 6 7 8 9 10 10 Groups of Jobs Oct 15, 2015 CS5935 - S16
Average ReduceTask Wait Time Small jobs are benefited from significantly shortened reduce wait time. Waiting time are reduced by 22 for the jobs in the first 6 groups. 10000 Average ReduceTask Wait FCS HFS 1000 0.8 1.22 4.5 0.5 Time (sec) 100 10 27.2 12.4 32.2 21 19.5 22 1 1 2 3 4 10 Groups of Jobs 5 6 7 8 9 10 Oct 15, 2015 CS5935 - S17
Fairness Evaluation: Maximum Slowdown Nearly uniform maximum slowdown for all groups of jobs. FCS improves the fairness by 66.7% on average. 20 18 Fair Completion Hadoop Fair Maximum Slowdown 16 14 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 10 groups of jobs Oct 15, 2015 CS5935 - S18
Summary for Coordinated Scheduling Identified the fairness and efficiency issues because of the lack of scheduling coordination. Introduced Preemptive ReduceTasks for efficient preemption of reduce tasks from long-running jobs Designed and Implemented Fast Completion Scheduler for fast execution of small jobs and better job fairness. Oct 15, 2015 CS5935 - S19
Broad Research Interests High Performance Computing Parallel computing models Scalable I/O & communication Computation & I/O optimization Data Analytics K-mer indexing for sequence fingerprinting and alignment Scalable Image Processing Fast community detection Big Data System Design and Management Fast Data Movement Efficient job management Multi-purpose framework Security and Reliability Analytics logging and Recovery Cloud security Storage security Oct 15, 2015 CS5935 - S20
Resource and Capabilities Software: Unstructured Data Accelerator (UDA) Accelerator for Big Data Analytics Transferred to Mellanox In-house big data platform 22 nodes; InfiniBand and 10GigE SSD and GPGPU (Phi and Kepler) Donations from Mellanox, Solarflare, and NVIDIA Oct 15, 2015 CS5935 - S21
Sponsors, Contractors, and Collaborators Current Sponsors and Contractors: NSF: Two active grants on big data analytics, storage and network systems, LLNL: burst buffer based storage systems Past Sponsors and Contractors NASA: one grant for for I/O Optimization of climate applications DOE Labs: many contracts for high-performance computing. Industry: contracts from Intel, Mellanox, NVIDIA and Scitor; Alabama: Innovation Award; Auburn IGP for TigerCloud. Collaborators: IBM, Intel, Mellanox, Scitor, Solarflare, AMD LBNL, ORNL, LLNL, SNL, GSFC, LPS Illinois Tech, Clemson, College of William Mary, NJIT, Georgia Tech, Auburn (OIT, Physics, Biology, SFWS) Oct 15, 2015 CS5935 - S22
Research Directions Main Thrusts Big Data Analytics and systems design Network and data privacy and security Interdisciplinary data-driven computational research Key: solve challenging problems with novel strategies Collaborations Students (systems/network oriented, interdisciplinary) Faculty (on and off campus) National laboratories Industry: IBM, Intel, and many more Oct 15, 2015 CS5935 - S23
Student Cultivation and Team Building Numerous Internships ORNL, Sandia, LANL, LLNL, IBM. Awards and honors First Prize of ACM Grand Finals SC11 Fellowship ($5000). Outstanding students: 2011-2014. Alumni (Ph.D. listed) Yuan Tian ORNL Xinyu Que IBM T.J. Watson Yandong Wang IBM Watson Zhuo Liu Yahoo! Cong Xu Intel Bin Wang Arista Networks Current Team 4 Ph.D. students Oct 15, 2015 CS5935 - S24