
Exploiting System Diversity in Peer-to-Peer Publish-Subscribe Systems
Explore the significance of system diversity in peer-to-peer publish-subscribe systems, emphasizing the various forms of system diversity and taxonomies of publish-subscribe systems in computing today. This study delves into the intricacies of successful distributed systems, highlighting the challenges and opportunities presented by interest subscription heterogeneity, network bandwidth, platform diversity, and more.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
1 EXPLOITING SYSTEM DIVERSITY IN PEER-TO-PEER PUBLISH-SUBSCRIBE SYSTEMS Final Exam of Jay A. Patel (April 1, 2009)
Outline 2 Computing Today System Diversity Publish-Subscribe Systems Thesis Statement Introduction Confluence RANS Framework Rappel Contributions Concluding Remarks Conclusion
Computing Today 3 Continued growth of Internet services search engines, webmail systems, social networks, blogging platforms, file sharing systems, etc. Many use publish-subscribe paradigm FaceBook, Twitter, YouTube Scaling via geographic distribution of infrastructure moving around large sets of data Content distribution, log collection, archiving data
System Diversity 4 Successful distributed systems must provide high reliability, availability, performance, and scalability System diversity arises from variations in either resources or characteristics available to end hosts e.g., network diversity requirements of individual end hosts themselves, usually arising due to end user demands e.g., interest diversity
Many Forms of System Diversity 5 Interest Subscription heterogeneity Network Bandwidth, location, connectivity, packet loss rates Platform Hardware and software Workload Time of day effects, flash crowds Other Availability
Taxonomies of Publish-Subscribe Systems 6 Subscription Model Subject based Content based Delivery Mechanisms Unicast ( 1-to-1 ) Single source multicast ( 1-to-n ) Convergecast ( n-to-1 ) General purpose multicast ( m-to-n ) Architecture Client-Server Peer-to-Peer RSS (contemporary delivery mechanism), Gryphon [Bhola03], Siena [Carzaniga01], Scribe [Castro02], FeedTree[Sandler05], Bayeux [Zhuang01], SpiderCast RSS (contemporary delivery mechanism), E-mail Listservs IP Multicast, ESM [Chu02], RMTP [Paul96], Gossip [Demers88], Bimodal Multicast RSS (contemporary delivery mechanism), Scribe [Castro02], FeedTree [Sandler05], Bayeux [Zhuang01], SpiderCast[Chockler07] Gryphon [Bhola03], Siena [Carzaniga01], Sub-2-Sub [Voulgaris05] Corona [Ramasubramanian06], Cobra [Rose07] [Chockler07], Sub-2-Sub [Voulgaris05] [Birman99] Scribe [Castro02], SplitStream [Castro03], SpiderCast [Chockler07], AnySee [Liao06]
Thesis Focus 7 We choose to focus on the convergecast and general purpose multicast paradigms as these paradigms remain the least explored of the lot with respect to system diversity Unicast is meant for simplicity of use Multicast is well studied these paradigms are most relevant in today s computing Multi-site cloud computing, grid computing, etc. Improved delivery for RSS
Thesis Statement 8 We show that by directly addressing interest and network diversity as a first class design principle, the scale and performance of peer-to-peer publish- subscribe systems can be improved.
Thesis Contributions 9 Confluence (n-to-1) A system for lossless collection of data from multiple sources to a single sink Exploits both spatial and temporal network diversity RANS Framework Provides realistic and deterministic simulation at large scales Helps model network, interest, and availability diversity Rappel (m-to-n) A system that leverages interest and network locality to improve fairness Exploits network and interest diversity
Outline 10 Computing Today System Diversity Publish-Subscribe Systems Thesis Statement Introduction Confluence RANS Framework Rappel Contributions Concluding Remarks Conclusion
Confluence 11 A System for Lossless Multi-Source Data Collection
Introduction 12 New emerging paradigm: fetching large files from multiple sources to a central clearing house Multiple data centers (cloud computing) PlanetLab servers Tele-immersive video gateways Grid computing A n-to-1 publish-subscribe system Multiple sources = multiple publishers Single sink = single subscriber
The Objective 13 To minimize the total time required to transfer the necessary files from the source nodes to the sink node. Currently, there are no known systems that optimize for this goal End users generally use the direct transfer (1-to-1) strategy to fetch files
Key Observation 14 The diversity of connections amongst Internet hosts has been widely observed and falls into two categories Spatial diversity refers to the fact that different links have different bandwidth availabilities Temporal diversity refers to the variation over time of the available bandwidth at a single link
2500 secs Host x x Host y y 2 MBps 5000 MB 5000 MB Exploit Natural Parallelism 5 MBps 1000 secs 1 MBps 5000 secs Host t t Motivating Example 15 The transfer process can be speeded up by routing data via intermediate nodes 37% of PlanetLab node pairs achieve better throughput by leveraging a third node
System Assumptions 16 Files can be subdivided into blocks All files (file blocks) are unique no a priori replicas present in the system Source node failures do not occur if a source node fails, Confluence provides no resiliency guarantees acceptable as same problem with Direct Transfer
Network Graph Model 17 Host x x x x+ + y y+ + Host y y C Cx x+ + C Cy y+ + x x0 0 y y0 0 c cyx yx c cxy xy C Cx x- - C Cy y- - x x - - t t+ + y y - - Host t t c cxt xt c cyt yt C Ct t+ + The network graph G supports both asymmetric link capacities asymmetric ISP connectivity limits t t0 0
Theoretical Solution (Spatial Only) 18 b bx x b by y Graph translation: GT T is duration Add super source s s Connect super source to nodes via file edges bx capacity equals file size held total capacity of s, The maximum s t0 flow in GT corresponding flow: f T Find the smallest time T* that can move B blocks from s t0 binary search on [0, Tmax) corresponding flow: f T* optimal flow rates: f *=f T*/T* Time complexity: O(log (VBC) VE log (V2/E)) s s x x+ + y y+ + C Cx x+ + TC TCx x+ + C Cy y+ + TC TCy y+ + i = B i b x x0 0 y y0 0 c cyx yx Tc Tcyx yx c cxy xy Tc Tcxy xy C Cx x- - TC TCx x- - C Cy y- - TC TCy y- - x x - - t t+ + y y - - c cxt xt Tc Tcxt xt c cyt yt Tc Tcyt yt C Ct t+ + TC TCt t+ + t t0 0
System Design 19 Maintaining network graph G via coordinator Link information staleness: large gap between probes (round robin) cost: must send data packets to measure our solution: k << n each participating node has k peers chosen uniformly at random by the coordinator Measuring cxy and Cx- PathChirp [Ribeiro2003] PathBlast brute force needed only once, during bootstrapping
Transfer Plan 20 Coordinator calculates based on theoretical solution Sends directives to nodes: how many blocks, to whom What about optimal flow rate? First approach: Flow Control System [Mehra2003] Subpar performance with multiple flows Temporal diversity galore Better approach: dynamic adaptation at the application level
Dynamic Adaptation 21 Data is pushed from senders to all receivers simultaneously Periodic recomputation Update network graph G with new state information the number of residual file blocks: b x if rxy > f *xy (1 slack) then cxy = max(cxy , rxy) else cxy = max(cxy /2, rxy) Coordinator recalculates transfer plan periodically State inconsistency: final_computation flag rxy measured flow rate f *xy optimal flow rate slack to prevent hysteresis
Exploiting Block Replication 22 b' b'y y + + b by yx x b' b'x x Replication naturally occurs during the transfer process e.g., node y holds some blocks originating at node x purge-immediately policy Tag blocks with origin node and unique ID File replication edge Conservation of blocks If the solution uses a file replication edge REPLICATED_BLOCKS directive Lets origin node retransmit Note: threshold for replication s s x x+ + y y+ + C Cx x+ + C Cy y+ + b by yx x x x0 0 y y0 0 c cxy xy c cyx yx C Cx x- - C Cy y- - x x - - t t+ + y y - - c cxt xt c cyt yt C Ct t+ + t t0 0
Experimental Methodology 23 Implemented using ns2 network simulator TCP CUBIC as transport layer protocol PlanetLab topology trace measurements collected via S3 [Yalagandula06] Up to a 100 source nodes 100 MB file per source node Confluence parameters Number of neighbors, k = 10 Recomputation interval, p = 15 Replication enabled with purge-immediately policy
Direct Transfer: Microbenchmark 24 Pool 49 Pool 10 Maximizing the number of parallel connections is the best approach Poorly-connection connections start earlier
Direct Transfer: Macrobenchmark 25 100 total nodes 50 runs: in run #i, node #i acts as a sink node First result: fetch from the first 50 nodes (49 source nodes) Second result: fetch from all nodes (99 source nodes) y-axis = ratio of second result to first result Direct Transfer scales will increasing number of source nodes Well-connected nodes are better able to exploit parallelism
Confluence vs. Direct Transfer 26 50 participating nodes, 50 different runs in run #i, node #i acts as a sink node, other nodes act as source nodes each sink node fetches from all other participating nodes Confluence performs better than Direct Transfer 70% perform better with Confluence on a planetary scale 90% perform better with Confluence on a continental scale
Overheads in Confluence 27 Measuring network graph G may be inaccurate, stale, or both k peers may not be able to saturate the sink Delayed start metadata must be collected by the coordinator, solution calculated, and transfer plan directives sent State inconsistency The final set of blocks sent directly to coordinator may take sometime to finish Tracking each block
Confluence vs. Direct Transfer II 28 Confluence performs well on different planetary scale topologies All nodes perform better in 3 topologies, 90% in one, and 70% in other Confluence excels in scenarios with small number of source nodes 100% under 50 nodes, 80% with 75 nodes, 70% with 100 nodes
Related Work 29 CoBlitz uses PlanetLab nodes to improve 1-to-1 transfers GridFTP tool to move around large sets of data on the grid BitTorrent [Cohen03], CDNs [Akamai] Multiple replicas of data available Miscellaneous Estimating bandwidth: pathChirp [Ribeiro03], PTR/IGI [Hu03], Pathload [Jain02], pathrate [Dovrolis04], Pathneck [Hu04], etc. TCP for LFNs: High-Speed TCP [Chase01], TCP BIC [Xu04], TCP CUBIC [Rhee05], FAST TCP [Wie06], TCP-Illinois [Liu06], etc.
Outline 30 Computing Today System Diversity Publish-Subscribe Systems Thesis Statement Introduction Confluence RANS Framework Rappel Contributions Concluding Remarks Conclusion
RANS Framework 31 Realistic and Deterministic Simulation at Large Scales
Motivation 32 Need to study system diversity at large scales network diversity: end-to-end latency fluctuations interest diversity: subscription correlation System deployment is labor-intensive and limited PlanetLab usually only has about 400 accessible nodes Experiments are not replayable Simulations provide an alternative, but not realistic
Introduction 33 RANS objectives Realism simulation results should match deployment observations simulation should be run using the same code as an actual implementation Deterministic replay unmodified application should yield the same result when provided with identical input as a previous execution Large scale ability to simulate several thousand end nodes selective granularity simulation
Application Programming Interface 34 An application interfaces with the EventManager and the TransportManager An application can run multiple protocols
Events 35
Messages 36
Implementation 37 Sockets implementation single-threaded, uses boost::asio Trace-based Simulator Overnet churn trace
Topology Manager 38 Goal: desire realistic end-to-end latency fluctuations Problem: artificial topology, limited scale of trace data Solution: topology fitting PlanetLab RTT trace with fluctuations 226 end hosts, over 4 hours continuous fluctuations between node pairs (median latency) Internet AS topology 20062 stub networks, 175 transit networks, and 8279 transit-and-stub network networks Topology fitting via trial and error Match simulator generated latencies with PlanetLab median latencies the first 10% of inter-AS links: latency between 0ms and 4ms the next 30% of inter-AS links: latency between 4ms and 30ms the final 60% of inter-AS links: latency between 30ms and 115ms Map each simulator node pair with a random PlanetLab node pair (for fluctuations)
Topology Fitting Results 39 The latencies modeled by RANS closely matches the latencies experienced within PlanetLab
Simulation vs. PlanetLab: Rappel Per-Feed Dissemination Tree 40 Experiment: 1 publisher, 250 subscribers, 4 hours, 1 update very minute Close match of results validates RANS framework 95% of nodes consistently receive updates in under 0.5 seconds
Simulation vs. PlanetLab II 41 Close match of results validates RANS framework Stretch ratio w.r.t. to the underlying coordinate space is lower
Related Work 42 Network simulators ns2, QualNet, OPNET Emulators ModelNet, EmuLab Testbed PlanetLab Application-level network simulators p2psim, GnutellaSim Programming libraries/languages for rapid deployment Macedon, P2
Outline 43 Computing Today System Diversity Publish-Subscribe Systems Thesis Statement Introduction Confluence RANS Framework Rappel Contributions Concluding Remarks Conclusion
Rappel 44 Using Locality to Improve Fairness in Publish- Subscribe Systems
System Goals 45 Peer-to-peer delivery for RSS Low publisher and subscriber overhead Optimizations due to m-to-n delivery paradigm Client Fairness Zero noise Load should scale w/ number of subscriptions Exploit interest diversity Real-time update dissemination Via a low stretch ratio Exploit network diversity
Design Overview 46 H A global control plane Used to locate other nodes ( friends ) close in interest- and network- proximity The friends overlay A multicast dissemination tree per feed Can be joined by contacting any active node Only subscribers join a feed s dissemination tree Eliminates noise C D A E B G F P1 P1 Pi K W Z B A
Techniques Utilized in Rappel 47 Friends Overlay Utility-based selection of friends Exploit interest and network locality using Bloom filters and network coordinates Discovering new nodes via gossip Periodic audits Per-feed Dissemination Trees Primitives based on network coordinates Periodic rejoin Push-pull of updates
Exploiting Network Diversity 48 Experiment: 1 publisher (at UIUC), 25 subscribers (distributed across USA) Rappel s per-feed dissemination trees exploit network diversity
Exploiting Interest Diversity 49 Experiment: 250 feeds, 5582 subscribers (simulation only) 91% clients perfectly covered; no wasted friend Rappel s friendship overlay effectively exploits interest diversity
Outline 50 Computing Today System Diversity Publish-Subscribe Systems Thesis Statement Introduction Confluence RANS Framework Rappel Contributions Concluding Remarks Conclusion