Flat Datacenter Storage Architecture

1 / 21

Embed Share

Explore the innovative Flat Datacenter Storage architecture designed by Nightingale, Elson, Fan, Hofmann, Howell, Suzue, Chen, and Gao. Discover the motivation, distributed metadata management, dynamic work allocation, replication, failure recovery, and performance evaluation aspects of this cutting-edge technology.

budd_u Follow

Uploaded on Mar 18, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, and Jinliang Fan, Owen Hofmann, Jon Howell and Yutaka Suzue Presented by Kuangyuan Chen, Qi Gao EECS 582 W16 1

Outline Motivation Overview Distributed Metadata Management Dynamic Work Allocation Replication and Failure Recovery Performance Evaluation 2

Motivation move computation to data? because of bandwidth shortage in the datacenter e.g. MapReduce locality constraints hinder efficient resource utilization stragglers, retasking 3

Motivation CLOS network supports full bisection bandwidth Top of Rack(TOR) routers spine routers 4

Flat Datacenter Storage All compute nodes can access all data with equal throughput Simple and easy to program All data operations are remote. All machine have as much network bandwidth as disk bandwidth. 5

Overview Per-Blob Metadata 8MB Tract -1 Tract 0 Tract 1 ... Tract N Blob 0x5fab97ff da5c7c00: Logically Centralized Storage Array Blob byte sequence named with GUID Tract unit of read and write constant sized (e.g. 8MB) FDS API e.g. CreateBlob(), WriteTract(), ReadTract() asynchronous/non-blocking: can be issued in parallel 6

Distributed Metadata Management Tractserver a process that manages a disk lay out tracts on the disk directly using raw disk interface Tract Locator Table (TLT) a list of active tractservers Tract_Locator = (Hash(GUID) + i) mod TLT_Length deterministic, and produce uniform disk utilization 7

Tract Locator List Locator(Row) Version Disk 1 Disk 2 Disk 3 0 122 A G H 1 5 B D F 2 6 C T B Tractserver Address 3 1 E V C 4 373 F A R ... TLT_Length 160 U E I Tract_Locator = (Hash(GUID) + i) mod TLT_Length Tractserver versioning for failure recovery 8

Distributed Metadata Management(cont) Metadata Server create TLT by balancing among tractservers distribute TLT to clients assign version number to tractservers in critical path only at client startup 9

Dynamic Work Allocation mitigate stragglers decouple data and computation assign work to workers dynamically and at fine granularity reduce dispersion to time of a single work unit 10

Replication When a disk fails, redundant copies of the lost data are used to restore the data to full replication. 11

Replication As long as the lost data tracts are restored somewhere in the system, we are good. 12

Replication Locator 1 2 3 4 5 6 ... 1234 1235 1236 Disk 1 A A A A A A ... Z Z Z Disk 2 B C D E F G ... W X Y Disk 3 C Z H M Y D ... Q W U All disk pairs appear in the table O(n^2) table size When a disk fails, the lost data can be recovered using the rest of disks in parallel 13

Failure Recovery - Metadata Server Increment the version number of each row in which the failed tractserver appears Pick random tractservers to fill in the empty spaces in the TLT Sends updated TLT assignments to every server affected by the changes Wait for each tractserver to ack the new TLT assignments, and then begins to give out the new TLT to clients when queried for it Locator 1 2 3 4 5 6 ... 1234 1235 1236 Version 8 17 324 3 456 7 ... 5 43 324 Disk 1 A A A A A A ... Z Z Z Disk 2 B C D E F G ... W X Y Disk 3 C Z H M Y D ... Q W U Locator 1 2 3 4 5 6 ... 1234 1235 1236 Version 9 18 325 4 457 8 ... 5 43 324 Disk 1 A A A A A A ... Z Z Z Disk 2 B C D E F G ... W X Y Disk 3 C Z H M Y D ... Q W U M R T Y U O 14

Failure Recovery - Tract Server When a tractserver receives an assignment of a new entry in the TLT, it contacts the other replicas and begins copying previously written tracts B M Locator 1 2 3 4 5 6 ... 1234 1235 1236 Version 9 18 325 4 457 8 ... 5 43 324 Disk 1 A A A A A A ... Z Z Z Disk 2 B C D E F G ... W X Y Disk 3 C Z H M Y D ... Q W U M R T Y U O C R D T 15

Failure Recovery - Client All client operations are tagged with TLT entry version number Client Tract Server Metadata Server 16

Evaluation 17

Evaluation Failure recovery time reduces with more disks! 18

Evaluation Question: Speed gain breakdown between full bisection bandwidth and FDS? 19

Conclusion Flat storage provides simplicity for applications. Deterministic data placement enables distributed metadata management. Without locality constraints, dynamic work allocation increase utilization. Highly scalable and fast failure recovery 20

Q&A 21

Flat Datacenter Storage Architecture

Download Presentation

Presentation Transcript

Related

More Related Content