
Optimistic Replication in Computer Systems
Explore the concept of optimistic replication in computer systems, focusing on the benefits, challenges, and reconciliation processes involved. Learn about the motivation behind optimistic peer replication, background information on high availability and concurrent updates, conflicts in replication, and examples of optimistic replication scenarios. Discover the goals, services, and outcomes of the reconciliation process in the context of Rumor performance evaluation.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Example: Rumor Performance Evaluation Andy Wang CIS 5930 Computer Systems Performance Analysis
Motivation Optimistic peer replication is popular Intermittent connectivity Availability of replicas for concurrent updates Convergence and correctness for updates Example: Rumor, Coda, Ficus, Calendar, Github 2
Background Replication provides high availability Optimistic replication allows immediate access to any replicated item, at the risk of permitting concurrent updates Reconciliation process makes replicas consistent (i.e., two replicas for peer-to- peer) 3
Background Continued Conflicts occur when different replicas of the same file are updated subsequent to the previous reconciliation 4
Optimistic Replication Example Log on Portable 10:00 10:25 Log on Desktop 10:00 10:25 connected Update Update Update Update Log on Desktop 10:00 10:25 10:40 Log on Portable 10:00 10:25 10:51 disconnected Update Update Update Update Update Update 5
Example Continued Log on Desktop 10:00 10:25 10:40 Log on Portable 10:00 10:25 10:51 disconnected Update Update Update Update Update Update Log on Desktop 10:00 10:25 10:40 10:51 Log on Portable 10:00 10:25 10:40 10:51 connected Update Update Update Update Update Update Update Update Run reconciliation Detect a conflict Propagate updates 6
Goal Understand the cost characteristics of the reconciliation process for Rumor 7
Services Reconciliation Exchange file system states Detect new and conflicting versions If possible, automatically resolve conflicts Else, prompt user to resolve conflicts Propagate updates 8
Outcomes Two reconciled replicas become consistent for all files and directories Some files remain inconsistent and require user to resolve conflicts 9
Metrics Time Elapsed time From the beginning to the completion of a reconciliation request User time (time spent using CPU) System time (time spent in the kernel) Failure rate Number of incomplete reconciliations and infinite loops (none observed) 10
Metrics not Measured Disk access time Require complex instrumentations E.g., buffering, logging, etc. Network and memory resources Not heavily used Correctness Difficult to evaluate 11
Monitor Implementation Reconciliation Process Perl library Spool-to-dump Recon Spool-to-dump C++ Scanner Rfindstored Rrecon Server Top-level Perl time command 12
Parameters System parameters CPU (speed of local and remote servers) Disk (bandwidth, fragmentation level) Network (type, bandwidth, reliability) Memory (size, caching effects, speed) Operating system (type, version, VM management, etc.) 13
Parameters (Continued) Workload parameters Number of replicas Number of files and directories Number of conflicts and updates Size of volumes (file size) 14
Workloads Update characteristics extracted from Geoff Kuenning s traces File access Read-write access Read- only access Nonshared access Read access Shared access Write access 2-way sharing Read access 3+way sharing Read access Write access Write access 15
Experimental Settings Machine model: Dell Latitude XP CPU: x486 100 MHz RAM: 36MB Ethernet: 10Mb Operating system: Linux 2.0.x File system: ext3 16
Experimental Settings Should have documented the following as well CPU: L1 and L2 cache sizes RAM: Brand and type Disk: brand, model, capacity, RPM, and the size of on-disk cache File system version 17
Experimental Design 255 full factorial design Linear regression or multivariate linear regression to model major factors Target: 95% confidence interval 18
255 Full Factorial Design Number of replicas: 2 and 6 Number of files: 10 and 1,000 File size: 100 and 22,000 bytes Number of directories: 10 and 100 Number of updates: 10 and 450 Capped at 10 updates for 10 files Number of conflicts: 0 /* typical */ 19
255 Full Factorial Analysis Elapsed time 150 Experiment errors < 3% 100 Time (seconds) 50 0 0 10 Experimental number 20 30 40 measured time predicted time System time User time 6 40 5 30 4 Time (seconds) Time (seconds) 3 20 2 10 1 0 0 0 10 20 30 40 0 10 Experimental number 20 30 40 Experimental number measured time predicted time measured time predicted time 20
Variation of Effects Top 5 effects for elapsed time % Variation 100 All major effects significant at 95% confidence interval 90 80 70 60 50 40 30 20 10 0 # files # dirs fileSize x #files Factor fileSize # updates Top 5 effects for user time Top 5 effects for system time % Variation % Variation 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 # files # replicas # dirs #files x #updates # updates # files # updates #files x #updates Factor fileSize fileSize x #files 21 Factor
Residuals vs. Predicted Time Clusters caused by dominating effects of files Elapsed time 20 15 10 5 Residuals (seconds) 0 0 50 100 150 -5 -10 -15 -20 Predicted time (seconds) System time User time 0.6 0.6 0.4 0.4 0.2 0.2 Residuals (seconds) Residuals (seconds) 0 0 0 1 2 3 4 5 0 10 20 30 40 -0.2 -0.2 -0.4 -0.4 -0.6 -0.6 Predicted time (seconds) Predicted time (seconds) 22
Residuals vs. Experiment Numbers Residuals show homoscedasticity, almost Elapsed time 20 15 10 5 0 residuals 0 50 100 150 200 -5 -10 -15 -20 Experimental number System time User time 0.6 0.6 0.4 0.4 0.2 0.2 0 0 residuals residuals 0 50 100 150 200 0 50 100 150 200 -0.2 -0.2 -0.4 -0.4 -0.6 -0.6 Experimental number Experimental number 23
Quantile-Quantile Plot Elapsed time 20 Residuals are normally distributed, almost y = 5.6125x + 2E-15 R = 0.9757 10 15 5 Residual quantiles 0 -4 -2 0 2 4 -5 -10 -15 -20 Normal quantiles System time User time 0.6 0.6 0.4 0.4 y = 0.1242x - 3E-16 R = 0.9524 0.2 y = 0.1125x - 2E-18 R = 0.9863 0.2 Residual quantiles Residual quantiles 0 0 -4 -2 0 2 4 -4 -2 0 2 4 -0.2 -0.2 -0.4 -0.4 -0.6 -0.6 Normal quantiles Normal quantiles 24
Multivariate Regression Number of replicas: 2 Number of files: 4 levels, 10-600 File size: 22,000 bytes Number of directories: 4 levels, 10-60 Number of updates: 0 Number of conflicts: 0 /* typical */ Number of repetitions: 5 per data point 25
Multivariate Regression Elapsed time 150 Experiment errors < 7% All coefficients are significant 100 Time (seconds) 50 0 0 20 40 60 80 100 Experiment number measured time predicted time System time User time 3.5 40 3 30 2.5 2 Time (seconds) Time (seconds) 20 1.5 1 10 0.5 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Experiment number Experiment number measured time predicted time measured time predicted time 26
Residuals vs. Predicted Time Elapsed time shows a bi-model trend User time shows an exponential trend Elapsed time 15 10 5 Residuals (seconds) 0 0 20 40 60 80 100 120 -5 -10 -15 Predicted time (seconds) System time User time 0.3 1 0.2 0.1 0.5 0 Residuals (seconds) Residuals (seconds) 0 0.5 1 1.5 2 2.5 3 -0.1 0 0 10 20 30 40 -0.2 -0.3 -0.5 -0.4 -0.5 -1 Predicted time (seconds) Predicted time (seconds) 27
Residuals vs. Experiment Numbers Not so good for elapsed time and user time Elapsed time 15 10 5 0 Residuals 0 20 40 60 80 100 -5 -10 -15 Experiment number User time System time 1 0.3 0.2 0.5 0.1 0 0 20 40 60 80 100 0 -0.1 residuals residuals 0 20 40 60 80 100 -0.2 -0.5 -0.3 -0.4 -0.5 -1 Experiment number Experiment number 28
Quantile-Quantile Plot Elapsed time 20 Residuals are not normally distributed for elapsed time and user time y = 5.6775x - 4E-14 R = 0.8407 15 10 5 Residual quantiles 0 -3 -2 -1 0 1 2 3 -5 -10 -15 -20 Normal quantiles System time User time 0.4 1.5 y = 0.1321x - 2E-15 R = 0.9789 y = 0.4811x - 2E-15 R = 0.9243 0.3 1 0.2 0.1 0.5 0 Residual quantiles Residual quantiles 0 -3 -2 -1 0 1 2 3 -0.1 -3 -2 -1 0 1 2 3 -0.2 -0.5 -0.3 -1 -0.4 -0.5 -1.5 Normal quantiles Normal quantiles 29
Log Transform (User Time) User time 0.04 ANOVA tests failed miserably 0.02 0 Residuals (seconds) 0 0.5 1 1.5 2 -0.02 -0.04 -0.06 Predicted time (seconds) User Time User time 0.04 0.08 y = 0.0222x - 1E-15 R = 0.8709 0.06 0.02 0.04 0.02 0 Residual quantiles 0 residuals 0 20 40 60 80 100 -0.02 -3 -2 -1 0 1 2 3 -0.02 -0.04 -0.04 -0.06 -0.06 -0.08 Normal quantiles Experiment number 30
Residual Analyses (User Time) No indications that transforms can help 0.25 0.2 0.15 Standard deviation of residuals 0.1 0.05 0 0 10 20 30 40 Mean user time 0.06 stdev errors 0.05 0.25 0.04 0.2 Variance of residuals Standard deviation of residuals 0.03 0.15 0.02 0.1 0.01 0.05 0 0 0 10 20 30 40 0 500 1000 1500 Mean user time Mean user time squared 31
Possible Explanations i-node related factors Number of files per directory block Crossing block boundary may cause anomalies Caching effects Reboot needed across experiments 32
Linear Regression Number of files: 100, 150, 200, 250, 252, 253, 300, 350, 400, 450 Test for the boundary-crossing condition as the number of files exceeds one block Note that Rumor has hidden files Number of repetitions: 5 per data point Flush cache (reboot) before each run 33
Linear Regression Elapsed time 100 R2 > 80% All coefficients are significant 80 60 Time (seconds) 40 20 0 0 100 200 Number of files 300 400 500 measured time predicted time 95% confidence interval User time System time 30 3 20 Time (seconds) 2 Time (seconds) 10 1 0 0 0 100 200 Number of files 300 400 500 0 100 200 Number of files 300 400 500 measured time predicted time measured time predicted time 95% confidence interval 34 95% confidence interval
Residuals vs. Predicted Time Elapsed time shows a bi-model trend User time shows an exponential trend Elapsed time 15 10 5 Residuals (seconds) 0 0 20 40 60 80 100 -5 -10 -15 Predicted time (seconds) System time User time 0.3 0.6 0.2 0.4 0.1 0.2 Residuals (seconds) Residuals (seconds) 0 0 0 0.5 1 1.5 2 2.5 -0.1 0 5 10 15 20 25 -0.2 -0.2 -0.3 -0.4 Predicted time (seconds) Predicted time (seconds) 35
Residuals vs. Experiment Numbers Elapsed time shows a rising bi-modal trend Randomization of experiments may help System time Elapsed time 15 10 5 0 residuals 0 20 40 60 -5 -10 -15 Experiment number User time 0.3 0.6 0.2 0.4 0.1 0.2 0 residuals residuals 0 0 10 20 30 40 50 60 -0.1 0 10 20 30 40 50 60 -0.2 -0.2 -0.3 -0.4 Experiment number Experiment number 36
Quantile-Quantile Plot Elapsed time 15 Error residuals for elapsed time is not normal Perhaps piece-wise normal y = 5.8218x + 5E-15 R = 0.878 10 5 Residual quantils 0 -3 -2 -1 0 1 2 3 -5 -10 -15 Normal quantiles System time User time 0.3 0.6 y = 0.0976x - 4E-16 R = 0.9693 y = 0.2134x + 2E-15 R = 0.9709 0.2 0.4 0.1 0.2 Residual quantiles Residual quantiles 0 0 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 -0.1 -0.2 -0.2 -0.4 -0.3 -0.6 Normal quantiles Normal quantiles 37
Possible Explanations i-node related factors: No Caching effects: No Hidden factors: Maybe Bugs: Maybe 38
Conclusion Identified the number of files as the dominating factor for Rumor running time Observed the existence of an unknown factor in the Rumor performance model 39
White Slide 40