Asynchronous Replication Tree Rebalancing Study

asynchronous rebalancing of a replicated tree n.w
1 / 36
Embed
Share

A study on asynchronous rebalancing of replicated trees, focusing on addressing tree unbalance issues and proposing novel algorithms for rebalancing. The study presents Treedoc, a replicated sequence built as an ordering tree, and discusses its replicated representation and operations such as read, addAt, and removeAt.

  • Replicated Tree
  • Algorithm
  • Asynchronous
  • Tree Rebalancing
  • Distributed Systems

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Asynchronous rebalancing of a replicated tree Marek Zawirski INRIA & UPMC, France Marc Shapiro INRIA & LIP6, France Nuno Pregui a UNL, Portugal CFSE, May 2011, Saint-Malo

  2. Summary Overview of Treedoc: Abstractly, always-responsive replicated sequence Built as a replicated ordering tree Problem faced: Tree unbalance Solution for asynchronous tree rebalance Algorithm requirements statement Novel F-translate algorithm Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 2

  3. Treedoc a replicated sequence replica1 replica2 I I 0 1 0 1 Replicated representation: Grow-only binary tree Stable, unique position ids P = 1 L P L P Total order < : infix traversal Sequence of atoms: Ops: read, addAt, removeAt L I P replica3 I 0 1 L P [Shapiro, Pregui a et. al, 2007, 2009] Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 3

  4. Treedoc a replicated sequence replica1 replica2 I I 0 1 0 1 L P L P L I P replica3 I S 0 1 addAt( , S) < < S I S P L P [Shapiro, Pregui a et. al, 2007, 2009] Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 4

  5. Treedoc a replicated sequence replica1 replica2 I I 0 1 0 1 L P L P 0 S L I S P replica3 I 0 1 S = 10 addAt( , S) < < S I S P L P [Shapiro, Pregui a et. al, 2007, 2009] Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 5

  6. Treedoc a replicated sequence replica1 replica2 I I 0 1 0 1 L P L P 0 S X L I S P replica3 I 0 1 addAt( , S) S L P removeAt( ) P [Shapiro, Pregui a et. al, 2007, 2009] Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 6

  7. Treedoc a replicated sequence replica1 replica2 I I 0 1 0 1 L P L P 0 S L I S replica3 I 0 1 addAt( , S) S L P removeAt( ) P [Shapiro, Pregui a et. al, 2007, 2009] Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 7

  8. Treedoc a replicated sequence replica1 Operation-based replication: Immediate local execution Propagate (cbcast) & replay replica2 I I 0 1 0 1 L P L P 0 0 S S L I S replica3 I 0 1 addAt( addAt( addAt( , S) , S) , S) S S S L P 0 removeAt( ) P S [Shapiro, Pregui a et. al, 2007, 2009] Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 8

  9. Treedoc a replicated sequence replica1 Operation-based replication: Immediate local execution Propagate (cbcast) & replay replica2 I I 0 1 0 1 L P L P 0 0 S S L I S replica3 I 0 1 addAt( , S) S L P 0 removeAt( removeAt( removeAt( ) ) ) P P P S [Shapiro, Pregui a et. al, 2007, 2009] Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 9

  10. Treedoc a replicated sequence replica1 Operation-based replication: Immediate local execution Propagate (cbcast) & replay replica2 I I 0 1 0 1 L P L P 0 0 0 0 E S A S addAt( addAt( , A) , A) A A E L I S replica3 I addAt( , E) E 0 1 L P 0 [Shapiro, Pregui a et. al, 2007, 2009] ? S Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 10

  11. Treedoc a replicated sequence replica1 Operation-based replication: Immediate local execution Propagate (cbcast) & replay replica2 I I 0 1 0 1 L P L P 0 0 0 0 S A S E A addAt( addAt( , A) , A) A A A E L I S replica3 I addAt( addAt( addAt( , E) , E) , E) E E E 0 1 addAt( , A) A L P 0 Predefined order: red < green < blue S [Shapiro, Pregui a et. al, 2007, 2009] Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 11

  12. Treedoc a replicated sequence replica1 Operation-based replication: Immediate local execution Propagate (cbcast) & replay Concurrent commute Eventually consistent replica2 I I 0 1 0 1 L P L P 0 0 0 0 S S E A E A A E L I S replica3 I 0 1 L P 0 0 Predefined order: red < green < blue S E A [Shapiro, Pregui a et. al, 2007, 2009] Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 12

  13. The tree rebalance problem With time tree gets worse and worse Unbalanced, empty nodes, lot of colors Various negative impacts I 0 1 L P 0 0 S E A Tree rebalance: Create minimal tree from nonempty nodes Keep order < Use single color (white) New ids (rectangles), incompatible with old rebalance L 0 1 Challenge: How to avoid system-wide consensus? E S 0 1 A I Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 13

  14. The core-nebula architecture Idea: limit consensus to a smaller number of replicas [Le ia et. al, 2009] Divide replicas into two disjoint sets: CORE NEBULA a stable group execute tree operations & agree on rebalance easier agreement sites join & leave, dynamic generate tree operations learns about rebalance perform catch-up protocol to integrate conc. changes never blocked Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 14

  15. Rebalance in core, catch-up from nebula core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P P L P P L P P L P P 1 1 1 1 S S S S addAt( addAt( addAt( addAt( , S) , S) , S) , S) S S S S Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 15

  16. Rebalance in core, catch-up from nebula core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P P L P P L P P L P P 1 1 1 1 S S S S removeAt( removeAt( removeAt( removeAt( ) ) ) ) P P P P Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 16

  17. Rebalance in core, catch-up from nebula core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 1 1 1 S S S S removeAt( ) I Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 17

  18. Rebalance in core, catch-up from nebula core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P S U rebalance final state addAt( addAt( , U) , U) addAt( addAt( , P) , P) U U P P Any pair of replicas can exchange operations in the same epoch rebalance@core initiates new epoch rebalance@core and operations@core inherently concurrent to ops@nebula! S 0 L 0 T addAt( , T) T Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 18

  19. Rebalance in core, catch-up from nebula core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P S U rebalance final state addAt( , U) addAt( addAt( , P) , P) Pairwise catch-up moves nebula replica to the next epoch U P P S 0 ? L 0 T addAt( , T) T Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 19

  20. Rebalance in core, catch-up from nebula core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P S U rebalance final state removeAt( ) I Pairwise catch-up moves nebula replica to the next epoch replay core ops until final state S 0 L 0 T Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 20

  21. Rebalance in core, catch-up from nebula core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P catch-up S U rebalance final state Pairwise catch-up moves nebula replica to the next epoch replay core ops until final state replay rebalance on final nodes translate nodes of nebula operations || rebalance into the new tree S S 0 0 L L 0 1 ? T P Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 21

  22. Naive translation algorithm(s) core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P catch-up S U rebalance final state S S Naive translation algorithm: Create new position respecting old order observed at the nebula replica 0 0 L L 0 1 ~[Le ia et. al, 2009] < < L P S T P S L < < P Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 22

  23. Naive translation algorithm(s) core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P catch-up S U rebalance final state removeAt( ) I S S 0 0 L L 0 1 T P Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 23

  24. Naive translation algorithm(s) core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P catch-up S U rebalance catch-up final state S S S 0 0 0 L L L 0 1 1 T P U < < L U S Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 24

  25. Naive translation algorithm(s) core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P catch-up S U rebalance catch-up final state S S S 0 0 0 L L L 0 1 1 T P U addAt( , P) addAt( , P) P U Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 25

  26. Naive translation algorithm(s) core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P catch-up S U rebalance catch-up final state Order observed at nebula2broken! S S S 0 0 0 L L L < U P 0 1 1 P < U P U T P U Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 26

  27. Towards correct translate: requirements 1. Order-preserving For every , the order is preserved between epochs: < => < X Y X Y X Y 2. Deterministic For every X X , nebulai, nebulaj, is translated identically: @nebulai= @nebulaj X X 3. Non-disruptive For every created by addAt and created by translate: != Solution: designate all cases in advance using final state only! X Y X Y Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 27

  28. F-Translate algorithm & sentinels core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P S U final state Sentinel position : Designated position for every potential translate < < < < Materialized on translate S Xi S 0 0 L L S1 0 X X2 X1 Y Xim T L1 L2L3 L4 Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 28

  29. F-Translate algorithm & sentinels core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P S U final state F-Translate: Right child of final node S S 0 0 L L S1 0 T L1 L2L3 L1 L4 1 U Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 29

  30. F-Translate algorithm & sentinels core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P S U final state F-Translate: Child of empty f. node Etc. S S S 0 0 0 L L S1 L S1 0 T L1 L2L3 L3 L4 L1 L2L3 L1 L4 0 1 P U Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 30

  31. F-Translate algorithm & sentinels core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P S U final state removeAt( ) I S S S 0 0 0 L L S1 L S1 0 T L1 L2L3 L3 L4 L1 L2L3 L1 L4 0 1 P U Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 31

  32. F-Translate algorithm & sentinels core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P S U final state S S S S 0 0 0 0 L L S1 L S1 L S1 0 T L1 L2L3 L3 L4 L1 L2L3 L1 L3 L4 L1 L2L3 L1 L4 0 1 0 1 P U P U Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 32

  33. F-Translate algorithm & sentinels core1 nebula1 I 0 1 nebula2 I nebula3 I I 0 1 0 1 0 1 L P I L P I L P I L P I 1 0 0 1 1 1 1 1 S P S U S P S U final state S S S S 0 0 0 0 L L S1 L S1 L S1 0 0 0 0 T T L1 L2L3 L1 L3 L4 T L1 L2L3 L1 L3 L4 T L1 L2L3 L1 L3 L4 L1 L2L3 L4 L1 L3 1 0 1 0 0 1 1 0 P U U P P U U P Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 33

  34. F-Translate: sentinel implementation How to implement sentinelAfter? TODO: REPLCACE w/figure!!! show if time allows & audience is not lost Empty nodes are necessary! Is more empty nodes discarded than introduced? Yes, but How to minimize empty nodes in sentinelAfter Using balanced binary tree! X O(1) number of empty nodes / path + O(log imax) / path Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree Define addAt such that it avoids empty nodes, long paths 34

  35. Summary Problem faced: Tree unbalance Solution for asynchronous tree rebalance Algorithm requirements statement Novel F-translate algorithm: Identify and utilize final state prior to rebalance Use sentinel positions Prototype catch-up implementation Future work? Evaluation of sentinelAfter implementation Formal order-preservation proof Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 35

  36. Appendix: the unbalance problem Use sparse tree and heuristic to assign PosID [Weiss et. al, 09] Tested on Wikipedia traces; works at the cost of possible anomaly Use list instead of a tree [Roh et. al, 10] Different costs and convergence characteristics? Rebalance the tree [Shapiro, Pregui a et. al, 09] System-wide consensus; inherent limitations The core-nebula idea [Le ia et. al 09]; incorrect translation This work brings: More formalization of the core-nebula for asynchronous systems Flaws revealed in naive algorithms Translation requirements statement Novel F-translate algorithm First prototype implementation in Java (subject to further studies) Zawirski, Shapiro, Pregui a - Asynchronous rebalancing of a replicated tree 36

More Related Content