
New Techniques to Curtail Tail Latency in Stream Processing Systems
Stream processing systems require low tail latency for critical applications, such as interactive web services and security-related tasks. This study presents innovative techniques to reduce tail latency, including adaptive timeout strategies and improved concurrency models for worker processes.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
NEW TECHNIQUES TO CURTAIL THE TAIL LATENCY IN STREAM PROCESSING SYSTEMS Guangxiang Du*, Indranil Gupta Department of Computer Science, University of Illinois, Urbana Champaign *Google (work done at UIUC) DPRG@UIUC: http://dprg.cs.uiuc.edu 1
Motivation For stream processing systems, latency is very critical. Most existing work, e.g. traffic-aware scheduling, elastic scaling, etc, focuses on lowering the average latency. Some applications, like interactive web service, security-related applications, require low tail latency. 2
Contributions of Our work We propose three techniques to lower tail latency in stream processing systems: Adaptive Timeout Strategy Improved Concurrency Model for worker process Latency feedback-based Load Balancing Implemented in Apache Storm Micro-benchmark & Real topologies Evaluation. 3
System Model op2 op4 op5 op1 op3 4
Adaptive Timeout Strategy Storm has a built-in mechanism to guarantee message processing. If a tuple has not been completely processed within timeout, it will get replayed. But it is fixed and specified by users. Op3 Op2 Op1 (source) Op4 (sink) t2 t5 t1 t3 t6 t8 t4 t7 We propose to adjust the timeout adaptively to catch and replay straggler tuples promptly. 5
Adaptive Timeout Strategy Contd At moment ?? set the timeout value for period ?? based on statistics of tuple latency in ?? 1 Intuition: continuously collects the statistics of tuple latency, and periodically adjusts the timeout value based on latency distribution of recent issued tuples. based on how long the tail has been in the last time period, decide how aggressively to set the timeout value with heuristic rules: For example, if (99thlatency)?? 1> 2 * (90thlatency)?? 1 Then Timeout??= (90thlatency)?? 1 6
Improved Concurrency Model For Worker Process Op2 Op3 W3 W2 W1 Op1 (source) Op4 (sink) t2 t5 t1 t3 t6 t8 t4 t7 7
Improved Concurrency Model For Worker Process Contd In an M/M/c queue model, ? : queue's input rate ?: server's service rate c : the number of servers for the queue ?: the utilization of the queue ?????: average queueing time It shows that for a given queue utilization, increasing the number of servers for a queue will lead to lower queueing delay. 8
Latency-based Load Balancing Many stream processing systems may run in heterogeneous conditions, for example: the machines (or VMs) may be heterogeneous, the task assignment may be heterogeneous (machines have different number of tasks), etc. 33.33% Op3 Op2 W3 W2 W1 Op1 (source) Op4 (sink) t2 t5 33.33% t1 t3 t6 t8 t4 t7 33.33% 9
Latency-based Load Balancing Contd Goal: faster tasks process more work, slower tasks process less work such that tasks have basically the same latency. 33.33% 34.33% 35.33% 51.33% W3 W1 W2 1st t1 t4 t7 33.33% 32.33% 23.33% 2nd 3rd t2 t5 33.33% 32.33% 25.33% t8 3rd 2nd t3 t6 10
Evaluation Experimental Setup (Google Compute Engine). We implement our techniques in Apache Storm and evaluate them. 1 VM for nimbus & zookeeper 5 VMs for worker nodes. By default, each worker node runs a worker process. VM Node Machine conguration Role 1 VM n1-standard-1 (1 vCPU, 3.75GB memory) n1-standard-2 (2 vCPUs, 7.5GB memory) Zookeeper & Nimbus 5 VMs Worker Node 11
Evaluation: Adaptive Timeout Strategy 4-operator Exclamation Topology from Storm examples. Comparison between adaptive timeout strategy with different levels of replication. Approach default adaptive timeout 20% replication 50% replication 100% replication 99thlatency (ms) 29.2 24.1 25.5 22.1 17.9 99.9thlatency (ms) 76.6 66.4 87.8 107.7 78.1 Cost ------- 2.92% 20% 50% 100% 12
Evaluation: Improved Concurrency For Worker Process micro-topology where a spout connects to a bolt through shuffle-grouping stream. The bolt has 20 task, each worker has 4 tasks. Average queueing delay drops from 2.07 ms to 0.516 ms. The 90thlatency, 99thlatency and 99.9thlatency are improved by 3.49 ms (35.5%), 3.94 ms (24.9%) and 30.1 ms (36.2%) respectively. 13
Evaluation: Latency-based Load Balancing three kinds of heterogeneous scenarios: Different Storm workers are assigned different numbers of tasks. Subset of Storm workers are competing for resources with external processes. Storm workers are deployed in a cluster of heterogeneous VMs. 14
Evaluation: Latency-based Load Balancing Contd 90thlatency 2.2% - 56% 99thlatency 21.4% - 60.8% 99.9thlatency 25% - 72.9% Improvement 15
Qualitative Conditions for the Techniques Given a topology, vary the tasks's input queue utilization and observe its effect on the adaptive timeout strategy and the improved concurrency model. 16
Qualitative Conditions for the Techniques Contd vary the system workload and observe its effect on the latency-based load balancing and the adaptive timeout strategy. 17
Real-world Topologies Evaluation Yahoo PageLoad Topology Yahoo Processing Topology 19
Real-world Topologies Experimental Results Name 90thlatency 99thlatency 28%-40% 36%-42% 50%-57% 99.9thlatency 24%-26% 20%-32% 21%-50% Adaptive Timeout Strategy Improved Concurrency Model Latency-based Load Balance ------ 16%-19% 22%-48% 20
Summary Propose a three novel techniques for reducing tail latency based on a common system model of system processing systems, like Storm. Adaptive Timeout Strategy Improved Concurrency Model for Worker Process Latency-based Load Balancing Provide guidelines for when to use which technique. Achieve improvement on tail latency up to 72.9% compared to Storm default implementation. DPRG: http://dprg.cs.uiuc.edu gdu3@illinois.edu 21