Automatic Improvement of Locality in Storage Systems

t he a utomatic i mprovement of l ocality n.w
1 / 23
Embed
Share

Explore the concept of automatic improvement of locality in storage systems through approaches like replicating and reorganizing disk blocks. Various heuristics and clustering strategies are discussed to enhance disk performance and reduce seek distance. The article also delves into the architecture of an autonomic storage system that adapts to workloads by optimizing the spatial locality of reference. Performance evaluation methodology focusing on metrics like response time, throughput, and read miss ratio is presented.

  • Storage Systems
  • Disk Performance
  • Clustering Strategies
  • Autonomic Storage
  • Performance Evaluation

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. THE AUTOMATIC IMPROVEMENT OF LOCALITY IN STORAGE SYSTEMS Sina Sereshki (sereshki@ce.sharif.edu) 1

  2. TOPICS Introduction &Approaches Background and Related Works Architecture Clustering strategies Performance analysis 2

  3. INTRODUCTION &APPROACHES disk performance in practice is increasing by only about 8% per year improving the average read performance by up to 50% for server workloads and by about 15% for personal computer workloads the performance gap between the processor and the disk continues to widen time of reading whole disk increasing by increasing disk density 3 only a portion of the stored data is in active use

  4. INTRODUCTION &APPROACHES An approach to increasing effective disk performance : Replicate and reorganize selected disk blocks so that the physical layout mirrors the logically sequential access. In this article : an autonomic storage system that adapts itself to a given workload by automatically reorganizing selected disk blocks to improve the spatial locality of reference (ALIS) 4

  5. BACKGROUNDAND RELATED WORKS Various heuristics have been used to lay out data on disk so that items (e.g.,files) that are expected to be used together are located close to one another they are based on static information files become fragmented over time The posteriori approach utilizes information about the dynamic reference behavior to arrange items but in doing so contiguous data that used to be accessed together could be split up 5

  6. BACKGROUNDAND RELATED WORKS superunit reduce internal fragmentation of the superunit seek distance Clustering merely moves related items close together to reduce the seek distance. multiple copies distribute multiple copies of the data in the storage system and to dynamically decide which copy to fetch by minimizing 6

  7. ARCHITECTURE OF ALIS 7

  8. PERFORMANCE EVALUATION METHODOLOGY Performance Metrics Response time Throughput Service time Read miss ratio 8

  9. CLUSTERING STRATEGIES clustering strategies : techniques for deciding which blocks to reorganize and how to lay these blocks out relative to one another 9

  10. CLUSTERING STRATEGIES Heat clustering small fraction of the data stored on disk is in active use determine the active blocks and cluster them together so that they can be accessed more efficiently with less physical movement Approach : count identify the frequently accessed data rearrange them 10

  11. CLUSTERING STRATEGIES Heat clustering reorganization is performed daily and that the reorganized area is 10% the size of the total volume and is located at a byte offset 30% into the volume with the volume laid out inward from the outer edge of the disk ascending order of their extent rank reduce seek distance without decreasing prefetch effectiveness. 11

  12. CLUSTERING STRATEGIES Run Clustering The existence of runs suggests a clustering strategy that seeks to identify these runs so as to lay them out sequentially (in the order they are referenced) in the reorganized area 12

  13. CLUSTERING STRATEGIES Run Clustering Representing Access Patterns Handling Pseudo Repeated Patterns edge i jequal to the number of times reorganization unit j is referenced shortly after accessing unit i all the results in this article assume a context size of 9. 13

  14. CLUSTERING STRATEGIES 14

  15. HEATAND RUN CLUSTERING COMBINED locate the reorganized area near the remaining hot spots distributed across the disk so that no single good location exists. When there is more than one up-to-date copy of a block the system can select the copy to fetch that would reduce the access time The reorganized area is shared between heat and run clustering, with the runs being allocated first Sharing the reorganized area dynamically between heat and run clustering works well 15

  16. PERFORMANCE ANALYSIS Clustering Algorithms In general, the PC workloads are improved less by ALIS than the server workloads : PC workloads, being more recent than the server workloads access patterns tend to be more varied and to repeat much less frequently than in the server workloads PCs mostly run a fixed set of applications which are installed sequentially and which use large sequential data sets 16

  17. PERFORMANCE ANALYSIS 17

  18. PERFORMANCE ANALYSIS Reorganized Area In servers a storage overhead of 15% For the PC workloads, the result is that just over 1% of the storage 18

  19. PERFORMANCE ANALYSIS Write Policy Including write requests in the access graph improves the write performance for some of the workloads but decreases read performance across the board. Therefore, the default policy in this article is to consider only reads for run clustering. As for which copy to update, the simulation results suggest updating the runs for the server workloads and invalidating the other copies. Default policy for the PC workloads is to update the home copy and invalidate the others. 19

  20. PERFORMANCE ANALYSIS Sensitivity to Parameters 20

  21. PERFORMANCE ANALYSIS Sensitivity to Parameters 21

  22. CONCLUSIONS ALIS, analyzes I/O reference patterns to replicate and reorganize selected disk blocks spatial locality of reference, and hence leverage the dramatic improvement in disk transfer rate idea of clustering together hot or frequently Reap the greater benefit of the two schemes and achieve performance that is superior to either technique alone 22

  23. CONCLUSIONS For the server workloads, read performance is improved by between 31% and 50% while write performance is improved by as much as 22%. The read performance for the PC workloads is improved by about 15% while the writes are faster by up to 8%. 23

More Related Content