
Challenges and Solutions in Performance Assurance for Large-Scale Systems
Explore the challenges faced in ensuring performance for large-scale systems and discover solutions such as proactive tuning and predictive models. Learn about the importance of meeting guaranteed job latency for optimal system utilization and user satisfaction.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Define Performance & Large Scale System Motivation for Performance Assurance Challenges in Assuring Performance Solutions for Assuring Performance Future directions
Job submit Throughput (#Jobs per second) Latency Job output Source: www.resultgroupinc.com
Machine/human data sources Distributed systems on WAN Data size Workload size Cluster size System capacity Source: www.pintrest.com
Given : Underlying System, Application growth Ensure: A job finished within guaranteed latency.
Faster growth in analytic applications data Rapid increase in workload Newer developments in Parallel Data Processing platforms Migration to new platform On-line analytics Maximal utilization of system
Faster growth in applications data Rapid increase in workload Newer developments in Parallel Data Processing platforms Migration to new platform Maximal utilization of system Loss of Users: User cannot tolerate delay in their job execution !! Heavy cost on Application Owners on SLA violations
In Development Environment Testing stage Design phase Capacity planning phase In Production Environment Alerts before Performance violations Proactive tuning of application or system
Performance Profiling & Tuning in Production Reactive, Cost & downtime time Volume Testing Proactive Deployment time & cost Volume Performance Prediction Models Proactive Deployment time & cost
Generate large data sizes Emulate large number of Users What we Have: Efficient Data Generation for RDBMS Parallel Data Generator on HDFS Reference: Efficient Synthetic Data Generator for structured Data" Chetan Phalak, Rekha Singhal, CMG USA San Deigo, November 2016.
Generate large data sizes Emulate large number of Users What we Have: Efficient Data Generation for RDBMS Parallel Data Generator on HDFS Require LARGE Resources and Time
Eliminate Volume Testing for Performance Assurance Prospective capacity planning SQL Query Tuning and scheduling Reduce application benchmarking cycle time *. * Reducing Structure Big Data Benchmark Cycle time using Query Performance Prediction Model, IEEE International conference on Computing, Communication Systems (ICCCS) 2015, Mauritius, December 2015.
Limited availability of the Production System Unavailability of the projected data size DB system Estimation of complex query output size Transparency to the underlying Hardware Subsystem Transparency to the underlying data management server - DB Server (Oracle, Postgres), Big data architectures
Emulate query processing on underlying system on large data size Use optimizer cost Get complex query processing steps in form of elementary operators and data access steps. Identify elementary operators in the system and build prediction model for each of them as function of data size RDBMS: sort, hash join MR : map, reduce, shuffle Hive: map join, reduce join Identify different modes of data access and build models for each of them Index, full table
SQL HiveQL Select Stage 1: Fetch Hash Join Stage 2: Map Table T2 Full Access Table T1 Index Access Stage 3: MR
Predict SQL Query Execution time for large data volume on RDBMS R.Singhal and Manoj Nambiar, Predicting SQL Query Execution Time for Large Data Volume , in Proceedings of IDEAS, Montreal, Canada, July, 2016. Database Buffer Cache Simulator to Study and Predict Cache Behavior for Query Execution, Chetan Phalak, Rekha Singhal and Tanmay Jhunjhunwala, proceedings of conference DATA, Portugal, July 2016. Measurement based model to study the affect of increase in data size on query response time , Rekha Singhal, Manoj Nambiar, Performance and Capacity CMG 2013, La Jolla, California, November 2013. Extrapolation of SQL Query Elapsed Response Time at Application Development Stage , Rekha Singhal, Manoj Nambiar, Proceedings of INDICON 2012, Kochi, India, December 2012. Predict HiveQL Job Execution time for Large data volume and cluster size in homogenous environment A. Sangroya and R. Singhal, Performance Assurance Model for HiveQL on Large Data Volume, in Proceedings of the International Workshop on Foundations of Big Data Computing in conjunction with 22nd IEEE International Conference on High Performance Computing, HiPC 15, December 2015. Predict HiveQL Job Execution time for Large data volume and cluster size in heterogenous environment using MR simulator R.Singhal and Abhishek Verma, Predicting Job Completion Time in Heterogeneous MapReduce Environments , in Proceedings of IPDPS: Heterogeneous computing workshop, IPDPS, May, 2016.
Performance Assurance Auto Tuning Performance Cost model as function of data processing engine performance parameters such as #map slots, sizeof map memory etc. Bottleneck Analysis Job log parsers co-relation with system utilization logs.
Prediction Environment Measurement Environment Appln Appln MR Framework MR Framework Machine A Machine A Machine A Machine A Machine B Machine B Machine B Machine A Machine B Small Data Large Data
Performance extrapolation across different deployments Cpu cores, RAM size, Storage sub system Performance extrapolation for larger concurrent workload Performance extrapolation for different mix of SQL query concurrent workloads
Motivation for performance models in large scale systems Volume testing for Performance Assurance Performance Prediction Models for large scale systems RDBMS, Hive+Hadoop Extension of these models for Auto Tuning and Capacity Sizing