Google File System Design and Architecture Overview

the google the google file system file system n.w

1 / 34

Embed Share

Explore the design and architecture of the Google File System (GFS) as presented by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung in the Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. The system's interactions, fault tolerance, measurements, and conclusions are discussed, providing insights into its scalability, reliability, and availability in managing huge files efficiently.

calleros_g Follow

Uploaded on Jun 11, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

The GOOGLE The GOOGLE FILE SYSTEM FILE SYSTEM Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung. Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. 2003. Presenter: Yen-Yu Chen Date: June 4, 2024

Contents Contents 1. Introduction 2. Design 3. System Interactions 4. Master Operation 5. Fault Tolerance and Diagnosis 6. Measurements 7. Conclusions

1. Introduction 1. Introduction

Background Background Previous Distributed File System Performance Scalability Reliability Availability

Different points in the design Different points in the design space space Component failures are the norm rather than the exception. Data store as huge file Files are huge by traditional standards. Most files are mutated by appending new data. Sustained bandwidth more critical than low latency.

2. Design 2. Design

Interface Interface Familiar Interface create delete open close read write Moreover Snapshot Atomic Record append

Architecture Architecture

Architecture Architecture Chunk Files are divided into fixed-size chunk 64MB Larger than typical file system block sizes Advantages from large chunk size Reduce interaction between client and master Client can perform many operations on a given chunk Reduce size of metadata stored on the master

GFS GFS chunkservers chunkservers Store chunks on local disks as Linux files Read/Write chunk data specified by a chunk handle & byte range

GFS master GFS master Maintains all file system metadata in memory Namespace Access-control information Chunk locations Lease management

GFS client GFS client Linked into each application Implements the file system API Communicates with the master & chunkservers

Process(Read) Process(Read)

Process Process Metadata only Data only

3. System Interactions 3. System Interactions

Lease Lease Objective Minimize load on master Master grants lease to one replica Called primary chunkserver

Dataflow Dataflow Write step

Atomic Appends Atomic Appends GFS appends data to the file at least once atomically append data data data

Atomic Appends Atomic Appends GFS appends data to the file at least once atomically data append data append data append

Atomic Appends Atomic Appends GFS appends data to the file at least once atomically data append append data append append data append

Snapshot Snapshot Goals To quickly create branch copies of huge data sets To easily checkpoint the current state Copy-on-write technique Metadata for the source file or directory tree is duplicated Reference count for chunks are incremented Chunks are copied later at the first write

4. Master Operation 4. Master Operation

Namespace Management and Namespace Management and Locking Locking

Creation, Re Creation, Re- -replication, Rebalancing Rebalancing replication, Replicate chunks that do not have a sufficient number of copies. Prioritize replicating frequently accessed chunks. Prioritize replicating chunks that have become bottlenecks.

Garbage Collection Garbage Collection Deleted files File is renamed to a hidden name, then may be removed later Orphaned chunks(unreachable chunks)

5. 5. Fault Tolerance and Diagnosis Fault Tolerance and Diagnosis

High Availability High Availability Fast Recovery Operation log and checkpoints Chunk Replication Master Replication

Data Integrity Data Integrity

Diagnostic Diagnostic Log record all operation on metadata

6. 6. Measurements Measurements

7. Conclusions 7. Conclusions

Advantages Advantages Divide the file into chunks for storage, which can be accessed concurrently and have high throughput. Separate control flow and data flow when modifying data, making full use of the bandwidth of each machine Use lease to reduce master workload Good fault tolerance

Disadvantages Disadvantages There is only one master. If there is too much metadata, there may not be enough memory. If the number of clients is large, the load on one master will be too large. It is too inefficient for the master to perform garbage collection by browsing all chunks. The consistency is too loose and cannot handle tasks that require high consistency

Google File System Design and Architecture Overview

Download Presentation

Presentation Transcript

Related

More Related Content