Crystal Geometry and Lattice Cells in Science Studies
Crystal geometry explores the concept of primitive lattice cells, which are minimum volume units used to build crystals. Understanding lattice parameters, crystal planes, and Miller indices are fundamental in crystallography research.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Mahesh Balakrishnan Microsoft Research / VMware Research THE SHARED LOG APPROACH TO CLOUD-SCALE CONSISTENCY Collaborators: Dahlia Malkhi, Ted Wobber, Vijayan Prabhakaran, Phil Bernstein, Ming Wu, Michael Wei, Dan Glick, John D. Davis, Aviad Zuck, Tao Zou, Sriram Rao
ANATOMY OF A DISTRIBUTED SYSTEM HDFS client - schedulers - allocators - coordinators - namespaces - indices - controllers - resource managers filesystems, key-value stores, block stores, MapReduce runtimes, software defined networks d a t a metadata HDFS namenode HDFS datanodes data is distributed, metadata is logically centralized
THE ACHILLES HEEL OF THE CLOUD metadata is physically centralized distribute later for durability / availability / scalability Coordinatorfailures will be handled safely using the ZooKeeper service [14]. Fast Crash Recovery in RAMCloud, Ongaro et al., SOSP 2011. Efforts are also underwayto address high availability of a YARN cluster by having passive/active failover of RM to a standby node. Apache Hadoop YARN: Yet Another Resource Negotiator, Vavilapalli et al., SOCC 2013. However, adequate resilience can be achieved by applying standard replication techniques to the decision element. NOX: Towards an Operating System for Networks, Gude et al., Sigcomm CCR 2008. The NameNode is a Single Point of Failure for the HDFS cluster. HDFS is not currently a High Availability system. needs active contributions to make it Highly Available. wiki.apache.org, Nov 2011. but distributing a centralized service is difficult!
PROBLEM #1: THE ABSTRACTION GAP centralized metadata services are built using in-memory data structures (e.g. Java / C# Collections) -state resides in maps, trees, queues, counters, graphs -transactional access to data structures - example: a scheduler atomically moves a node from a free list to an allocation map distributing a service requires different abstractions -move state to external service like DB/ZooKeeper - or implement distributed protocols
PROBLEM #2: PROTOCOL SPAGHETTI sharding transactions replication caching, geo-mirroring, versioning, snapshots, rollback, elasticity logging inefficient when layered, unsafe when combined
PROBLEM STATEMENT metadata services are difficult to build, harden, and scale, due to: restrictive abstractions complex protocols can we simplify the construction of distributed metadata services?
THE SHARED LOG ABSTRACTION . . . shared log API: O = append(V) V = read(O) trim(O) //GC O = check() //tail clients remote shared log read from anywhere append to tail clients can concurrently append to the log, read from anywhere in its body, check the current tail, and trim entries that are no longer needed.
OUTLINE a shared log is a powerful and versatile abstraction.Tango (SOSP 2013) provides transactional in-memory data structures backed by a shared log. the shared log abstraction can be implemented efficiently. CORFU (NSDI 2012) is a scalable, distributed shared log that supports millions of appends/sec. a fast, scalable shared log enables fast, scalable distributed services. Tango+CORFU supports millions of transactions/sec.
OUTLINE a shared log is a powerful and versatile abstraction.Tango (SOSP 2013) provides transactional in-memory data structures backed by a shared log. the shared log abstraction can be implemented efficiently. CORFU (NSDI 2012) is a scalable, distributed shared log that supports millions of appends/sec. a fast, scalable shared log enables fast, scalable distributed services. Tango+CORFU supports millions of transactions/sec.
THE SHARED LOG APPROACH application application a Tango object = view in-memory data structure + history updates in shared log 1. Tango objects are easy to use 2. Tango objects are easy to build Tango runtime Tango runtime the shared log is the source of - persistence - consistency - elasticity - atomicity and isolation across multiple objects shared log uncommitted data commit record no messages only appends/reads on the shared log!
TANGO OBJECTS ARE EASY TO USE implement standard interfaces (Java/C# Collections) linearizability for single operations under the hood: example: curowner = ownermap.get( ledger ); if(curowner.equals(myname)) ledger.add(item);
TANGO OBJECTS ARE EASY TO USE implement standard interfaces (Java/C# Collections) linearizability for single operations serializable transactions under the hood: TX commits if read- set (ownermap) has not changed in conflict window example: TR.BeginTX(); curowner = ownermap.get( ledger ); if(curowner.equals(myname)) ledger.add(item); status = TR.EndTX(); TX commit record: read-set: (ownermap, ver:2) write-set: (ledger, ver:6) speculative commit records: each client decides if the TX commits or aborts independently but deterministically [similar to Hyder (Bernstein et al., CIDR 2011)]
TANGO OBJECTS ARE EASY TO BUILD 15 LOC == persistent, highly available, transactional register class TangoRegister { } int oid; TangoRuntime T; int state; void apply(void X) { } void writeRegister (int newstate) { T >update_helper(&newstate , sizeof (int) , oid); } int readRegister () { T >query_helper(oid); return state; } object-specific state invoked by Tango runtime on EndTX to change state state = (int )X; mutator: updates TX write-set, appends to shared log accessor: updates TX read-set, returns local state Other examples: Java ConcurrentMap: 350 LOC Apache ZooKeeper: 1000 LOC Apache BookKeeper: 300 LOC simple API exposed by runtime to object: 1 upcall + two helper methods arbitrary API exposed by object to application: mutators and accessors
OUTLINE a shared log is a powerful and versatile abstraction.Tango (SOSP 2013) provides transactional in-memory data structures backed by a shared log. the shared log abstraction can be implemented efficiently. CORFU (NSDI 2012) is a scalable, distributed shared log that supports millions of appends/sec. a fast, scalable shared log enables fast, scalable distributed services. Tango+CORFU supports millions of transactions/sec.
THE CORFU DESIGN application CORFU API: O = append(V) V = read(O) trim(O) //GC O = check() //tail smart client library Tango runtime passive flash units: write-once, sparse address spaces CORFU read from anywhere append to tail 4KB each entry maps to a replica set
THE CORFU PROTOCOL: READS client D1/ D2 L0 L4 ... D5/ D6 L2 L6 ... D7/ D8 L3 L7 ... D3/ D4 L1 L5 ... page 0 page 1 Tango read(pos) D1 D3 D5 D7 CORFU library D2 D4 D6 D8 read(D1/D2, page#) Projection: D1 D2 D3 D4 D5 D6 D7 D8 CORFU cluster L0L1L2L3L4L5L6L7 . . 16
THE CORFU PROTOCOL: APPENDS client holes in the log caused by a crashed client other clients can fill CORFU append throughput: # of 64-bit tokens issued per second sequencer is only an optimization! clients can probe for tail or reconstruct it from flash units reserve next position in log (e.g., 8) sequencer (T0) Tango read(pos) append(val) D1 D3 D5 D7 CORFU library D2 D4 D6 D8 write(D1/D2, val) Projection: D1 D2 D3 D4 D5 D6 D7 D8 CORFU cluster fast reconfiguration protocol: 10 ms for 32- L0L1L2L3L4L5L6L7 . drive cluster . 17
CHAIN REPLICATION IN CORFU client C1 2 1 client C2 client C3 safety under contention: if multiple clients try to write to same log position concurrently, only one wins writes to already written pages => error durability: data is only visible to reads if entire chain has seen it reads on unwritten pages => error requires write-once semantics from flash unit
OUTLINE a shared log is a powerful and versatile abstraction.Tango (SOSP 2013) provides transactional in-memory data structures backed by a shared log. the shared log abstraction can be implemented efficiently. CORFU (NSDI 2012) is a scalable, distributed shared log that supports millions of appends/sec. a fast, scalable shared log enables fast, scalable distributed services. Tango+CORFU supports millions of transactions/sec.
A FAST SHARED LOG ISNT ENOUGH node 1 node 2 allocation table B B the playback bottleneck: clients must read all entries inbound NIC is a bottleneck A A A aggregation tree A B B A B B A free list C C C C C C solution: stream abstraction - readnext(streamid) - append(value, streamid1, ) 10 Gbps 10 Gbps C A C A B C B A C A B each client only plays entries of interest to it B B B C C C A A A A
TXES OVER STREAMS node 1 node 2 allocation table B B beginTX read A write C endTX A A A decision record with commit/ abort bit aggregation tree A B B A B B A free list C C C C C C 0 0 A A C C A A C C A A C C B B C C B B C C B B C C skip skip skip skip skip skip skip skip skip skip skip skip commit/abort? has A changed? yes, abort commit/abort? has A changed? don t know! node 1 helps node 2
DISTRIBUTED TXES OVER STREAMS node 1 node 2 allocation table B B beginTX read A, B write C endTX A A A aggregation tree A B B A B B A free list C C C C C C C C O 1 0 1 A A C C A A C C A A B B C C B B C C B B C C skip skip skip skip skip skip skip skip skip skip skip skip commit/abort? has B changed? don t know! commit/abort? has A changed? don t know! node 1 and node 2 help each other! distributed transactions without a distributed (commit) protocol!
RESEARCH INSIGHTS a durable, iterable total order (i.e., a shared log) is a unifying abstraction for distributed systems, subsuming the roles of many distributed protocols it is possible to impose a total order at speeds exceeding the I/O capacity of any single machine a total order is useful even when individual nodes consume a subsequence of it
HOW FAR IS CORFU FROM PAXOS? CORFU scales the Paxos acceptor role: each consensus decision is made by a different set of acceptors learners L0L1L2L3L4L5L6L7 . L0L1L2L3L4L5L6L7 . . . streaming CORFU scales the Paxos learner role: each learner plays a subsequence of commands D1 D3 D5 D7 acceptors acceptors D2 D4 D6 D8 CORFU cluster
(RECENT) RELATED WORK state machine replication distributed txes RAFT 2014 Tango 2013 Transaction Chains Calvin Hyder Eve 2012 Spanner Walter CORFU Megastore 2011 time Percolator 2010 2009 ZooKeeper 2008 Sinfonia 2007 Chubby 2006
EVALUATION: LINEARIZABLE OPERATIONS beefier shared log scaling continues ultimate bottleneck: sequencer (latency = 1 ms) adding more clients more reads/sec until shared log is saturated a Tango object provides elasticity for strongly consistent reads constant write load (10K writes/sec), each client adds 10K reads/sec
EVALUATION: MULTI-OBJECT TXES Tango enables fast, distributed transactions across multiple objects over 100K txes/sec when 16% of txes are cross-partition similar scaling to 2PL without a complex distributed protocol More recent results: 1M txes/sec with 64% cross-partition txes 18 clients, each client hosts its own TangoMap cross-partition tx: client moves element from its own TangoMap to some other client s TangoMap 28
ONGOING WORK CorfuDB: open source rewrite (www.github.com/corfudb) Hopper: streams hop across different shared logs -geo-distribution (a log per site) -tiered storage (a log per storage class) collaborations around shared log designs for: -single-machine block storage (with Hakim Weatherspoon) -mobile data-sharing (with Rodrigo Rodrigues) -multi-core data structures (with Marcos Aguilera, Sid Sen) -big data analytics (with Tyson Condie, Wyatt Lloyd)
FUTURE WORK can we get rid of complex distributed protocols? (what this talk was about ) can we get rid of complex storage stacks? storage idioms: index/log/cache/RAID/buffer hypothesis: storage stacks can be constructed by composing persistent, transactional data structures: easy to synthesize code; predict performance; discover new designs
CONCLUSION Tango objects: data structures backed by a shared log key idea: the shared log does all the heavy lifting (durability, consistency, atomicity, isolation, elasticity ) Tango objects are easy to use, easy to build, and fast thanks to CORFU, a shared log without an I/O bottleneck