Google File System Design and Architecture Overview

the google the google file system file system n.w
1 / 34
Embed
Share

Explore the design and architecture of the Google File System (GFS) as presented by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung in the Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. The system's interactions, fault tolerance, measurements, and conclusions are discussed, providing insights into its scalability, reliability, and availability in managing huge files efficiently.

  • Google
  • File System
  • GFS Design
  • Architecture
  • Scalability

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. The GOOGLE The GOOGLE FILE SYSTEM FILE SYSTEM Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung. Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. 2003. Presenter: Yen-Yu Chen Date: June 4, 2024

  2. Contents Contents 1. Introduction 2. Design 3. System Interactions 4. Master Operation 5. Fault Tolerance and Diagnosis 6. Measurements 7. Conclusions

  3. 1. Introduction 1. Introduction

  4. Background Background Previous Distributed File System Performance Scalability Reliability Availability

  5. Different points in the design Different points in the design space space Component failures are the norm rather than the exception. Data store as huge file Files are huge by traditional standards. Most files are mutated by appending new data. Sustained bandwidth more critical than low latency.

  6. 2. Design 2. Design

  7. Interface Interface Familiar Interface create delete open close read write Moreover Snapshot Atomic Record append

  8. Architecture Architecture

  9. Architecture Architecture Chunk Files are divided into fixed-size chunk 64MB Larger than typical file system block sizes Advantages from large chunk size Reduce interaction between client and master Client can perform many operations on a given chunk Reduce size of metadata stored on the master

  10. GFS GFS chunkservers chunkservers Store chunks on local disks as Linux files Read/Write chunk data specified by a chunk handle & byte range

  11. GFS master GFS master Maintains all file system metadata in memory Namespace Access-control information Chunk locations Lease management

  12. GFS client GFS client Linked into each application Implements the file system API Communicates with the master & chunkservers

  13. Process(Read) Process(Read)

  14. Process Process Metadata only Data only

  15. 3. System Interactions 3. System Interactions

  16. Lease Lease Objective Minimize load on master Master grants lease to one replica Called primary chunkserver

  17. Dataflow Dataflow Write step

  18. Atomic Appends Atomic Appends GFS appends data to the file at least once atomically append data data data

  19. Atomic Appends Atomic Appends GFS appends data to the file at least once atomically data append data append data append

  20. Atomic Appends Atomic Appends GFS appends data to the file at least once atomically data append append data append append data append

  21. Snapshot Snapshot Goals To quickly create branch copies of huge data sets To easily checkpoint the current state Copy-on-write technique Metadata for the source file or directory tree is duplicated Reference count for chunks are incremented Chunks are copied later at the first write

  22. 4. Master Operation 4. Master Operation

  23. Namespace Management and Namespace Management and Locking Locking

  24. Creation, Re Creation, Re- -replication, Rebalancing Rebalancing replication, Replicate chunks that do not have a sufficient number of copies. Prioritize replicating frequently accessed chunks. Prioritize replicating chunks that have become bottlenecks.

  25. Garbage Collection Garbage Collection Deleted files File is renamed to a hidden name, then may be removed later Orphaned chunks(unreachable chunks)

  26. 5. 5. Fault Tolerance and Diagnosis Fault Tolerance and Diagnosis

  27. High Availability High Availability Fast Recovery Operation log and checkpoints Chunk Replication Master Replication

  28. Data Integrity Data Integrity

  29. Diagnostic Diagnostic Log record all operation on metadata

  30. 6. 6. Measurements Measurements

  31. 7. Conclusions 7. Conclusions

  32. Advantages Advantages Divide the file into chunks for storage, which can be accessed concurrently and have high throughput. Separate control flow and data flow when modifying data, making full use of the bandwidth of each machine Use lease to reduce master workload Good fault tolerance

  33. Disadvantages Disadvantages There is only one master. If there is too much metadata, there may not be enough memory. If the number of clients is large, the load on one master will be too large. It is too inefficient for the master to perform garbage collection by browsing all chunks. The consistency is too loose and cannot handle tasks that require high consistency

Related


More Related Content