Building a Bytecode Virtual Machine: A Comprehensive Guide

busy developer s guide to building a bytecode n.w
1 / 58
Embed
Share

Learn how to build a virtual machine from scratch, understand its objectives, implementation, and major components. Explore the architecture and key services provided by a virtual machine in this detailed developer's guide.

  • Bytecode
  • Virtual Machine
  • Developer Guide
  • Architecture
  • Implementation

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Busy Developer's Guide to Building A Bytecode Virtual Machine

  2. Objectives Building a virtual machine understanding a VM can come from building one so let's build one! and understand that there's limits to what we can do in a single session

  3. How, exactly, do we build one of these? VIRTUAL MACHINE IMPLEMENTATION

  4. Virtual Machine Implementation What is a VM? an abstraction that defines a computing model (aka AbstractMachine) an implementation of an AbstractMachine on top of hardware

  5. Virtual Machine Implementation What does a VM provide? state (data) explicit (memory, registers, stack) static stack dynamic implicit (translated code, working areas, stateful I/O, ...) execution (code) definition (bytecode, native, etc) loading/resolution

  6. Virtual Machine Implementation What does a VM provide? libraries services automatic memory management metadata access and manipulation ... others depending on VM intent host/guest interaction (FFI)

  7. Major moving parts of most virtual machines ARCHITECTURE

  8. Architecture Simplified Loader and dynamic linker Loader pulls in application package Loader extracts data structures (code and data) constant pool: collection of constants Dynamic linker resolves all referenced symbols Execution engine processor: fetch-decode-execute mechanism call stack: function call frames

  9. Architecture Simplified Memory manager application data virtual machine data ("overhead") Thread scheduler Language extension Runtime services: platform access, etc Language extension/access: FFI

  10. Architecture Code typically unique Instruction Set Architecture (ISA) loaded in from disk (or elsewhere?) formats often optimized for easy reading typically inaccessible (directly) to runtime code usually we need to know where we are in here

  11. Architecture IP/PC register "instruction pointer" or "program counter" tracks the current location in code indicative of the next instruction for execution manipulated by flow-control instructions

  12. Architecture Call stack collection of "activation frames" (procedure/method/function calls) LIFO order--push and pop from one end, most recent at the top contain parameters passed to the activation provides space for local-allocated values and space for any additional in-process work

  13. Architecture FP/SP register "frame pointer" or "stack pointer" pointer to the current activation frame offset for any direct memory access of frame data

  14. Architecture Global memory "static" sequential memory locations often allocated by the language/virtual machine (global variables)

  15. Architecture Global memory "heap" sequential memory locations dynamically-allocated by running code typically somewhat managed by VM for programmer simplicity

  16. Architecture Processor a software CPU fetch-decode-execute cycle fetch: get the next instruction (from IP) decode: if the instruction is "packed" execute: utilizing any operands required

  17. Architecture Constants fixed collection of values in memory loaded along with code values known to the VM ahead-of-time string values function locations "magic constants" stored globally for easy reference

  18. Understanding/implementing one BYTECODE

  19. Bytecode What is bytecode? an abstract instruction set that is, not tied to hardware usually simplified than CPU assembly low-level allowing for more complex construction providing core/basic functionality tied to the abstract machine executing it stack vs register machine, for example (not always: see LLVM)

  20. Bytecode Bytecode categories constant values stack manipulation (push, pop, etc) arithmetic operations flow-control operations (branch-if-true, branch-if-false, ...) storage operations (store, load; global, local) type conversion operations (int/float/object/etc to int/float/object/etc) allocation/deallocation

  21. Bytecode Bytecode categories (imperative) procedure invocation direct or indirect (via address) taking reference (messaging) message-send, "super-send" (object) method invocation (functional) closure definition (logic) backtrack manipulation

  22. Bytecode Syntax/semantics instructions are formed of two parts operation code (opcode) operation parameters (operands) these will sometimes be supplemented by other things directives (commands to the tools) labels (symbolic names used)

  23. Bytecode Opcodes take operands (parameters) some take none (NOP, HALT, pop, etc) some take one (a constant value, etc) some take two (add, subtract, etc) some may take a varying number

  24. Bytecode Opcode/operand format multiple bytes/values "packed" bytes/values note that "raw" format and "programmer" format need not be identical binary assemblers (source-to-binary) s-expressions ("lisp") XML (s-expressions with angle brackets)

  25. Everybody on, everybody off the stack STACK-BASED VIRTUAL MACHINES

  26. Stack-based Virtual Machines Abstract operations stack Registers will be special-purpose (IP, FP, SP, etc) ... but most operations will focus on the operations stack all operands come from the stack all results go back onto the stack the stack must (eventually) balance out to zero stack width is (usually) one "word"

  27. Stack-based Virtual Machines "Abstract" is key most abstract VMs don't ignore registers register access is much faster than actual CPU stack CPUs often have their own conventions to honor ... so the operation stack may not be entirely on the CPU stack remember, "abstract" is usually a simplification, not a straitjacket

  28. Stack-based Virtual Machines Stack-manipulation Operations push (or const) op: push op onto the stack pop: pop op off the stack dup: duplicate (pop, push, push) ldc ix: push constant (from constant pool index ix)

  29. Stack-based Virtual Machines Mathematics Operations binary math operations (add, sub, mul, div, mod, pow) pop sp1, sp2; sp1 (operate) sp2; push result may specialize opcodes to precision (32-bit, 64-bit, int vs float, etc) order is important! ((1 - 2) != (2 - 1), remember); left-to-right or right-to-left? unary math operations (neg, abs, etc) pop sp1; (operate) sp1; push result may specialize opcodes to precision

  30. Stack-based Virtual Machines Bitwise Operations shift or rotate operations (shiftl, shiftr, rotl, rotr) pop sp1; (operate) sp1; push result may specialize opcodes to precision most of these are really just extensions of unary math operations

  31. Stack-based Virtual Machines Comparison Opcodes pop sp1, sp2 sp1 (compare) sp2? if true, push 1/true if false, push 0/false equality, non-equality greater-than, less-than ... and variations (greater-than-or-equal, less- than-or-equal, etc) may specialize opcodes to precision or

  32. Stack-based Virtual Machines Branching/"jumping" Operations must always have one operand: where to jump typically an address in the bytecode operand may be "direct" (embedded in the stored code) operand may be "indirect" (we get the address from somewhere else) operand may be "absolute" address (within all the code)

  33. Stack-based Virtual Machines Compare-then-branch is common jump-if-zero, jump-if-not-zero the most common depends on how the opcodes and operands are encoded or how "wide" the ISA wants to be these are often mirrored by underlying CPU opcodes

  34. Stack-based Virtual Machines Call frame Operations call typically implies a branch-and-return (ret) pair conventions are critical call conventions parameters go onto the operations stack order-sensitive "left-to-right" (from callers' perspective)? or "right-to-left" (from callers' perspective)? typically assumes pass-by-value semantics (no immediate modifications)

  35. Stack-based Virtual Machines Call frame Operations return conventions result goes onto operations stack usually single-value results, but... ... nothing really stops us from having multiple results if we wished ... which could be one way pass-by-reference is implemented! who cleans up the stack? where is return-address stored?

  36. How many registers do you need? REGISTER-BASED VIRTUAL MACHINES

  37. Register-based Virtual Machines Storage/work focuses on registers general-purpose registers integer-only registers floating-point registers string/data registers note: there's still a program stack involved making register VMs something of a superset to stack VMs

  38. Register-based Virtual Machines Registers can hold single value typically of "word" size (32 or 64 bits) frequently of different sizes (for optimization purposes) may or may not map to actual CPU registers

  39. Register-based Virtual Machines Value-manipulation LOADRx: load value into register constant value from local or global memory from other register STORERx: store register into local or global memory into other register

  40. Register-based Virtual Machines Stack-manipulation Operations push (or const) op or Rx: push op or the contents of Rx onto the stack pop: pop op off the stack; pop Rx pops op off the stack into Rx dup: duplicate (pop, push, push) ldc ix: push constant (from constant pool index ix)

  41. Register-based Virtual Machines Mathematical operations one/some registers are often specific to math (accumulator) operands can often be register local memory global memory ... and some machines restrict which from where binary operations (add, sub, mul, div, mod) three operands: src1, src2 and dest

  42. Register-based Virtual Machines Bitwise Operations shift or rotate operations (shiftl, shiftr, rotl, rotr) most of these are really just extensions of unary math operations generally these will operate against a small set of registers

  43. Register-based Virtual Machines Comparison Opcodes operand (compare) register? if true, push 1/true if false, push 0/false equality, non-equality greater-than, less-than ... and variations (greater-than-or-equal, less- than-or-equal, etc) may specialize opcodes to precision or convenience (comparison-to-zero, for

  44. Register-based Virtual Machines Branching/"jumping" Operations must always have one operand: where to jump typically an address in the bytecode operand may be "direct" (embedded in the stored code) operand may be "indirect" (we get the address from somewhere else) operand may be "absolute" address (within all the code)

  45. Register-based Virtual Machines Compare-then-branch is common jump-if-zero, jump-if-not-zero the most common depends on how the opcodes and operands are encoded or how "wide" the ISA wants to be these are often mirrored by underlying CPU opcodes

  46. Register-based Virtual Machines Call frame Operations call typically implies a branch-and-return (ret) pair conventions are critical parameters can go into registers typically size-sensitive typically ordered (some/remaining) parameters go onto the operations stack order-sensitive "left-to-right" (from callers' perspective)?

  47. Register-based Virtual Machines Call frame Operations result goes into register who cleans up the stack? where is return-address stored?

  48. Build it already! IMPLEMENTATION

  49. Implementation Steps (1/2) Basic architecture and scaffolding Simple (no-operand) ops: NOP, HALT, DUMP Simple stack ops: CONST, LDC, POP Globals ops: GSTORE, GLOAD Math ops: ADD, SUB, etc

  50. Implementation Steps (2/2) Comparison ops: EQ, NE, GTE, etc Branching ops: JMP, JT, JF, etc Call ops: CALL, CALLI, RET Locals ops: LSTORE, LLOAD

Related


More Related Content