Efficient Fault-Tolerant Services with Lightweight Virtual Machines

tardigrade leveraging lightweight virtual n.w
1 / 15
Embed
Share

"Explore how Tardigrade leverages lightweight virtual machines for easily constructing fault-tolerant services. Solutions include Bascule/LibOS to enhance efficiency and reliability, addressing latency and IO issues. Checkpointer API strategies are implemented for effective snapshot creation during system calls."

  • Lightweight VMs
  • Fault Tolerance
  • Bascule
  • LibOS
  • Checkpointer

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Tardigrade: Leveraging Lightweight Virtual Machines to Easily and Efficiently Construct Fault-Tolerant Services OVERVIEW -Gerson Rodriguez 1

  2. Outline Goals, current state Problems Solution Results 2

  3. Fault Tolerant Services 3

  4. Virtual Machine (Xen) Dom 0 VM 1 VM N Hypervisor Hardware 4

  5. Virtual Machine Replication (VMR) 5

  6. Asynchronous VMR Remus Take a snapshot of current state of VM Perform a checkpoint after each epoch Forces output buffer 6

  7. Problems Latency, Having to save an entire VM and load during faults Non important services can delay packets 7

  8. Solution Light weight virtual machine using Bascule / LibOS Serve only a process and not an entire VM without modifications to binaries Turn existing binaries into fault tolerant services 8

  9. LibOS and Bascule Bascule allows OS independent extensions to be attached at run time Allows extentions from LibOS without modification (ie updates can be performed) 9

  10. Checkpoint Stack 10

  11. Checkpointer API does not allow to suspend of running threads to take a snapshot of current memory. Solution is to raise an exception in each thread at a given checkpoint and wait for all threads to reach these exceptions during system calls. Once reached a checkpoint can be created. Problem: What if the thread is waiting on IO before reaching the exception? Solution: Raise exceptions prior or after known system calls that may take time to finish execution 11

  12. Tardigrade Diagram View (Primary) View (Backup) View (Spare) Orchestrator (View Manager) Clients 12

  13. 13

  14. High Latency Impacts Memory Dirtying Nondeterministic events 14

  15. Questions? 15

Related


More Related Content