Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers

Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers
Slide Note
Embed
Share

This study delves into enhancing cache coherency in active caches within multi-tier data centers using InfiniBand technology. The research covers design, implementation, and experimental results, emphasizing the importance of cache coherency for scalability and performance optimization.

  • Cache Coherency
  • Multi-Tier Data Centers
  • InfiniBand
  • Performance Optimization
  • Scalability

Uploaded on Mar 09, 2025 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy, J. Wu and D. K. Panda The Ohio State University

  2. Presentation Outline Introduction/Motivation Design and Implementation Experimental Results Conclusions

  3. Introduction Fast Internet Growth Number of Users Amount of data Types of services Several uses E-Commerce, Online Banking, Online Auctions, etc Types of Content Images, documents, audio clips, video clips, etc - Static Content Stock Quotes, Online Stores (Amazon), Online Banking, etc. - Dynamic Content (Active

  4. Presentation Outline Introduction/Motivation Multi-Tier Data-Centers Active Caches InfiniBand Design and Implementation Experimental Results Conclusions

  5. Multi-Tier Data-Centers Single Powerful Computers Clusters Low Cost to Performance Ratio Increasingly Popular Multi-Tier Data-Centers Scalability an important issue

  6. A Typical Multi-Tier Data-Center Apache Web Servers Tier 2 Tier 0 Database Servers Clients Proxy Nodes WAN Tier 1 Application Servers PHP

  7. Tiers of a Typical Multi-Tier Data-Center Proxy Nodes Handle Caching, load balancing, security, etc Web Servers Handle the HTML content Application Servers Handle Dynamic Content, Provide Services Database Servers Handle persistent storage

  8. Data-Center Characteristics Front-End Tiers Computation Back-End Tiers The amount of computation required for processing each request increases as we go to the inner tiers of the Data-Center Caching at the front tiers is an important factor for scalability

  9. Presentation Outline Introduction/Motivation Introduction Multi-Tier Data-Centers Active Caches InfiniBand Design and Implementation Experimental Results Conclusions

  10. Caching Can avoid re-fetching of content Beneficial if requests repeat Static content caching Well studied in the past Widely used Front-End Tiers Number of Requests Decrease Back-End Tiers

  11. Active Caching Dynamic Data Stock Quotes, Scores, Personalized Content, etc Simple caching methods not suited Issues Consistency Coherency User Request Proxy Node Cache Back-End Data Update

  12. Cache Consistency Non-decreasing views of system state Updates seen by all or none Proxy Nodes Back-End Nodes Update User Requests

  13. Cache Coherency Refers to the average staleness of the document served from cache Two models of coherence Bounded staleness (Weak Coherency) Strong or immediate (Strong Coherency)

  14. Strong Cache Coherency An absolute necessity for certain kinds of data Online shopping, Travel ticket availability, Stock Quotes, Online auctions Example: Online banking Cannot afford to show different values to different concurrent requests

  15. Caching policies Consistency Coherency No Caching Client Polling Invalidation * TTL/Adaptive TTL *D. Li, P. Cao, and M. Dahlin. WCIP: Web Cache Invalidation Protocol. IETF Internet Draft, November 2000.

  16. Presentation Outline Introduction/Motivation Introduction Multi-Tier Data-Centers Active Caches InfiniBand Design and Implementation Experimental Results Conclusions

  17. InfiniBand High Performance Low latency High Bandwidth Open Industry Standard Provides rich features RDMA, Remote Atomic operations, etc Targeted for Data-Centers Transport Layers VAPI IPoIB SDP

  18. Performance Throughput Latency 900 140 800 Throughput (MB/s) IPoIB SDP VAPI 120 700 IPoIB SDP VAPI 100 600 Latency (us) 500 80 400 60 300 40 200 20 100 0 0 4 16 64 Message Size 256 1K 4K 16K 64K 2 4 8 16 32 64 Message Size 128 256 512 1K 2K 4K 8K 16K Low latencies of less than 5us achieved Bandwidth over 840 MB/s * SDP and IPoIB from Voltaire s Software Stack

  19. Performance Throughput (RDMA Read) 7000 25 6000 20 Throughput (Mbps) 5000 15 4000 3000 10 2000 5 1000 0 0 1 4 16 64 256 1K 4K 16K 64K 256K Message Size (bytes) Send CPU Throughput (Poll) Recv CPU Throughput (Event) Receiver side CPU utilization is very low Leveraging the benefits of One sided communication

  20. Caching policies Consistency Coherency No Caching Client Polling Invalidation TTL/Adaptive TTL

  21. Objective To design an architecture that very efficiently supports strong cache coherency on InfiniBand

  22. Presentation Outline Introduction/Motivation Design and Implementation Experimental Results Conclusions

  23. Basic Architecture External modules are used Module communication can use any transport Versioning: Application servers version dynamic data Version value of data passed to front end with every request to back-end Version maintained by front end along with cached value of response

  24. Mechanism Cache Hit: Back-end Version Check If version current, use cache Invalidate data for failed version check Cache Miss Get data to cache Initialize local versions

  25. Architecture Front-End Back-End Request Cache Hit Response Cache Miss

  26. Design Every server has an associated module that uses IPoIB, SDP or VAPI to communicate VAPI: When a request arrives at proxy, VAPI module is contacted. Module reads latest version of the data from the back-end using one-sided RDMA Read operation If versions do not match, cached value is invalidated

  27. VAPI Architecture Front-End Back-End Request RDMA Read Cache Hit Response Cache Miss

  28. Implementation Socket-based Implementation: IPoIB and SDP are used Back-end version check is done using two-sided communication from the module Requests to read and update are mutually excluded at the back-end module to avoid simultaneous readers and writers accessing the same data. Minimal changes to existing software

  29. Presentation Outline Introduction/Motivation Design and Implementation Experimental Results Conclusions

  30. Data-Center: Performance DataCenter: Throughput 2500 Transactions per second (TPS) 2000 1500 1000 500 0 0 10 20 30 40 50 60 70 80 90 100 200 Number of Compute Threads No Cache IPoIB VAPI SDP The VAPI module can sustain performance even with heavy load on the back-end servers

  31. Data-Center: Performance Datacenter: Response Time 10 9 8 Response time (ms) 7 6 5 4 3 2 1 0 0 10 20 30 40 50 60 70 80 90 100 200 Number of Compute Threads NoCache IPoIB VAPI SDP The VAPI module responds faster even with heavy load on the back-end servers

  32. Response Time Breakup Response Time Splitup - 200 Compute Threads Response Time Splitup - 0 Compute Threads 8 8 7 7 IPoIB SDP VAPI 6 6 IPoIB SDP VAPI 5 5 Time (ms) Time (ms) 4 4 3 3 2 2 1 1 0 0 Client Proxy Processing Module Processing Backend version check Client Proxy Processing Module Processing Backend version check Communication Communication Worst case Module Overhead less than 10% of the response time Minimal overhead for VAPI based version check even for 200 compute threads

  33. Data-Center: Throughput Throughput: ZipF distribution ThroughPut: World Cup Trace 3000 800 Transactions per Second (TPS) Transactions Per Second (TPS) 700 2500 600 2000 500 400 1500 300 1000 200 500 100 0 0 0 10 20 30 40 50 60 70 80 90 100 200 0 10 20 30 40 50 60 70 80 90 100 200 Number of Compute Threads Number of Compute Threads No Cache IPoIB VAPI SDP NoCache IPoIB VAPI SDP The drop in the throughput of VAPI in World cup trace is due to the higher penalty for cache misses under increased load VAPI implementation does better for real trace too

  34. Conclusions An architecture for supporting Strong Cache Coherence External module based design Freedom in choice of transport Minimal changes to existing software Sockets API inherent limitation Two-sided communication High performance Sockets not the solution (SDP) Main benefit One sided nature of RDMA calls

  35. Web Pointers home page NBC http://nowlab.cis.ohio-state.edu/ E-mail: {narravul, balaji, vaidyana, savitha, wuj, panda} @cis.ohio-state.edu

More Related Content