Building a Bytecode Virtual Machine: A Comprehensive Guide

busy developer s guide to building a bytecode n.w

1 / 58

Embed Share

Learn how to build a virtual machine from scratch, understand its objectives, implementation, and major components. Explore the architecture and key services provided by a virtual machine in this detailed developer's guide.

adyli Follow

Uploaded on Mar 21, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Busy Developer's Guide to Building A Bytecode Virtual Machine

Objectives Building a virtual machine understanding a VM can come from building one so let's build one! and understand that there's limits to what we can do in a single session

How, exactly, do we build one of these? VIRTUAL MACHINE IMPLEMENTATION

Virtual Machine Implementation What is a VM? an abstraction that defines a computing model (aka AbstractMachine) an implementation of an AbstractMachine on top of hardware

Virtual Machine Implementation What does a VM provide? state (data) explicit (memory, registers, stack) static stack dynamic implicit (translated code, working areas, stateful I/O, ...) execution (code) definition (bytecode, native, etc) loading/resolution

Virtual Machine Implementation What does a VM provide? libraries services automatic memory management metadata access and manipulation ... others depending on VM intent host/guest interaction (FFI)

Major moving parts of most virtual machines ARCHITECTURE

Architecture Simplified Loader and dynamic linker Loader pulls in application package Loader extracts data structures (code and data) constant pool: collection of constants Dynamic linker resolves all referenced symbols Execution engine processor: fetch-decode-execute mechanism call stack: function call frames

Architecture Simplified Memory manager application data virtual machine data ("overhead") Thread scheduler Language extension Runtime services: platform access, etc Language extension/access: FFI

Architecture Code typically unique Instruction Set Architecture (ISA) loaded in from disk (or elsewhere?) formats often optimized for easy reading typically inaccessible (directly) to runtime code usually we need to know where we are in here

Architecture IP/PC register "instruction pointer" or "program counter" tracks the current location in code indicative of the next instruction for execution manipulated by flow-control instructions

Architecture Call stack collection of "activation frames" (procedure/method/function calls) LIFO order--push and pop from one end, most recent at the top contain parameters passed to the activation provides space for local-allocated values and space for any additional in-process work

Architecture FP/SP register "frame pointer" or "stack pointer" pointer to the current activation frame offset for any direct memory access of frame data

Architecture Global memory "static" sequential memory locations often allocated by the language/virtual machine (global variables)

Architecture Global memory "heap" sequential memory locations dynamically-allocated by running code typically somewhat managed by VM for programmer simplicity

Architecture Processor a software CPU fetch-decode-execute cycle fetch: get the next instruction (from IP) decode: if the instruction is "packed" execute: utilizing any operands required

Architecture Constants fixed collection of values in memory loaded along with code values known to the VM ahead-of-time string values function locations "magic constants" stored globally for easy reference

Understanding/implementing one BYTECODE

Bytecode What is bytecode? an abstract instruction set that is, not tied to hardware usually simplified than CPU assembly low-level allowing for more complex construction providing core/basic functionality tied to the abstract machine executing it stack vs register machine, for example (not always: see LLVM)

Bytecode Bytecode categories constant values stack manipulation (push, pop, etc) arithmetic operations flow-control operations (branch-if-true, branch-if-false, ...) storage operations (store, load; global, local) type conversion operations (int/float/object/etc to int/float/object/etc) allocation/deallocation

Bytecode Bytecode categories (imperative) procedure invocation direct or indirect (via address) taking reference (messaging) message-send, "super-send" (object) method invocation (functional) closure definition (logic) backtrack manipulation

Bytecode Syntax/semantics instructions are formed of two parts operation code (opcode) operation parameters (operands) these will sometimes be supplemented by other things directives (commands to the tools) labels (symbolic names used)

Bytecode Opcodes take operands (parameters) some take none (NOP, HALT, pop, etc) some take one (a constant value, etc) some take two (add, subtract, etc) some may take a varying number

Bytecode Opcode/operand format multiple bytes/values "packed" bytes/values note that "raw" format and "programmer" format need not be identical binary assemblers (source-to-binary) s-expressions ("lisp") XML (s-expressions with angle brackets)

Everybody on, everybody off the stack STACK-BASED VIRTUAL MACHINES

Stack-based Virtual Machines Abstract operations stack Registers will be special-purpose (IP, FP, SP, etc) ... but most operations will focus on the operations stack all operands come from the stack all results go back onto the stack the stack must (eventually) balance out to zero stack width is (usually) one "word"

Stack-based Virtual Machines "Abstract" is key most abstract VMs don't ignore registers register access is much faster than actual CPU stack CPUs often have their own conventions to honor ... so the operation stack may not be entirely on the CPU stack remember, "abstract" is usually a simplification, not a straitjacket

Stack-based Virtual Machines Stack-manipulation Operations push (or const) op: push op onto the stack pop: pop op off the stack dup: duplicate (pop, push, push) ldc ix: push constant (from constant pool index ix)

Stack-based Virtual Machines Mathematics Operations binary math operations (add, sub, mul, div, mod, pow) pop sp1, sp2; sp1 (operate) sp2; push result may specialize opcodes to precision (32-bit, 64-bit, int vs float, etc) order is important! ((1 - 2) != (2 - 1), remember); left-to-right or right-to-left? unary math operations (neg, abs, etc) pop sp1; (operate) sp1; push result may specialize opcodes to precision

Stack-based Virtual Machines Bitwise Operations shift or rotate operations (shiftl, shiftr, rotl, rotr) pop sp1; (operate) sp1; push result may specialize opcodes to precision most of these are really just extensions of unary math operations

Stack-based Virtual Machines Comparison Opcodes pop sp1, sp2 sp1 (compare) sp2? if true, push 1/true if false, push 0/false equality, non-equality greater-than, less-than ... and variations (greater-than-or-equal, less- than-or-equal, etc) may specialize opcodes to precision or

Stack-based Virtual Machines Branching/"jumping" Operations must always have one operand: where to jump typically an address in the bytecode operand may be "direct" (embedded in the stored code) operand may be "indirect" (we get the address from somewhere else) operand may be "absolute" address (within all the code)

Stack-based Virtual Machines Compare-then-branch is common jump-if-zero, jump-if-not-zero the most common depends on how the opcodes and operands are encoded or how "wide" the ISA wants to be these are often mirrored by underlying CPU opcodes

Stack-based Virtual Machines Call frame Operations call typically implies a branch-and-return (ret) pair conventions are critical call conventions parameters go onto the operations stack order-sensitive "left-to-right" (from callers' perspective)? or "right-to-left" (from callers' perspective)? typically assumes pass-by-value semantics (no immediate modifications)

Stack-based Virtual Machines Call frame Operations return conventions result goes onto operations stack usually single-value results, but... ... nothing really stops us from having multiple results if we wished ... which could be one way pass-by-reference is implemented! who cleans up the stack? where is return-address stored?

How many registers do you need? REGISTER-BASED VIRTUAL MACHINES

Register-based Virtual Machines Storage/work focuses on registers general-purpose registers integer-only registers floating-point registers string/data registers note: there's still a program stack involved making register VMs something of a superset to stack VMs

Register-based Virtual Machines Registers can hold single value typically of "word" size (32 or 64 bits) frequently of different sizes (for optimization purposes) may or may not map to actual CPU registers

Register-based Virtual Machines Value-manipulation LOADRx: load value into register constant value from local or global memory from other register STORERx: store register into local or global memory into other register

Register-based Virtual Machines Stack-manipulation Operations push (or const) op or Rx: push op or the contents of Rx onto the stack pop: pop op off the stack; pop Rx pops op off the stack into Rx dup: duplicate (pop, push, push) ldc ix: push constant (from constant pool index ix)

Register-based Virtual Machines Mathematical operations one/some registers are often specific to math (accumulator) operands can often be register local memory global memory ... and some machines restrict which from where binary operations (add, sub, mul, div, mod) three operands: src1, src2 and dest

Register-based Virtual Machines Bitwise Operations shift or rotate operations (shiftl, shiftr, rotl, rotr) most of these are really just extensions of unary math operations generally these will operate against a small set of registers

Register-based Virtual Machines Comparison Opcodes operand (compare) register? if true, push 1/true if false, push 0/false equality, non-equality greater-than, less-than ... and variations (greater-than-or-equal, less- than-or-equal, etc) may specialize opcodes to precision or convenience (comparison-to-zero, for

Register-based Virtual Machines Branching/"jumping" Operations must always have one operand: where to jump typically an address in the bytecode operand may be "direct" (embedded in the stored code) operand may be "indirect" (we get the address from somewhere else) operand may be "absolute" address (within all the code)

Register-based Virtual Machines Compare-then-branch is common jump-if-zero, jump-if-not-zero the most common depends on how the opcodes and operands are encoded or how "wide" the ISA wants to be these are often mirrored by underlying CPU opcodes

Register-based Virtual Machines Call frame Operations call typically implies a branch-and-return (ret) pair conventions are critical parameters can go into registers typically size-sensitive typically ordered (some/remaining) parameters go onto the operations stack order-sensitive "left-to-right" (from callers' perspective)?

Register-based Virtual Machines Call frame Operations result goes into register who cleans up the stack? where is return-address stored?

Build it already! IMPLEMENTATION

Implementation Steps (1/2) Basic architecture and scaffolding Simple (no-operand) ops: NOP, HALT, DUMP Simple stack ops: CONST, LDC, POP Globals ops: GSTORE, GLOAD Math ops: ADD, SUB, etc

Implementation Steps (2/2) Comparison ops: EQ, NE, GTE, etc Branching ops: JMP, JT, JF, etc Call ops: CALL, CALLI, RET Locals ops: LSTORE, LLOAD

Building a Bytecode Virtual Machine: A Comprehensive Guide

Download Presentation

Presentation Transcript

Related

More Related Content