diff --git a/README.md b/README.md index 90731fb..02f651c 100644 --- a/README.md +++ b/README.md @@ -1,146 +1,99 @@ # RoastVM -A Java Virtual Machine (JVM) implementation written in Rust, capable of parsing and executing Java class files and bytecode. +A Java Virtual Machine (JVM) implementation written in Rust. ## Overview -RoastVM is an educational/experimental JVM implementation that demonstrates the core components and execution model of the Java Virtual Machine. The project uses Rust's type safety and modern tooling to build a simplified but functional JVM interpreter. +RoastVM is an educational/experimental JVM implementation that executes Java bytecode. It supports class file parsing, +bytecode interpretation, JNI native methods, and includes a boot image system for loading the Java standard library. ## Features -### Currently Implemented - -- **Class File Parsing**: Full support for reading and deserializing binary Java class files (`.class`) with magic number `0xCAFEBABE` -- **Constant Pool Management**: Handles 20+ constant pool entry types (UTF8, Integer, Float, Long, Double, Class, String, MethodRef, FieldRef, InterfaceMethodRef, NameAndType, MethodHandle, MethodType, InvokeDynamic, etc.) -- **Dynamic Class Loading**: On-demand class loading with superclass and interface resolution, caching via DashMap -- **Class Initialization**: Automatic `` method execution following JVM Spec 5.5, with recursive initialization tracking -- **Bytecode Execution**: Interpreter for 50+ JVM bytecode instructions including: - - Constants: `aconst_null`, `iconst_*`, `lconst_*`, `fconst_*`, `dconst_*`, `bipush`, `sipush`, `ldc`, `ldc_w`, `ldc2_w` - - Load/Store: `iload`, `lload`, `fload`, `dload`, `aload`, `istore`, `lstore`, `fstore`, `dstore`, `astore` (including `_0-3` variants) - - Array operations: `iaload`, `laload`, `faload`, `daload`, `aaload`, `baload`, `caload`, `saload`, `iastore`, `lastore`, `fastore`, `dastore`, `aastore`, `bastore`, `castore`, `sastore`, `arraylength` - - Stack manipulation: `pop`, `pop2`, `dup`, `dup_x1`, `dup_x2`, `dup2`, `dup2_x1`, `dup2_x2`, `swap` - - Arithmetic: `iadd`, `ladd`, `fadd`, `dadd`, `isub`, `lsub`, `fsub`, `dsub`, `imul`, `lmul`, `fmul`, `dmul`, `idiv`, `ldiv`, `fdiv`, `ddiv`, `irem`, `lrem`, `frem`, `drem`, `ineg`, `lneg`, `fneg`, `dneg` - - Bitwise: `ishl`, `lshl`, `ishr`, `lshr`, `iushr`, `lushr`, `iand`, `land`, `ior`, `lor`, `ixor`, `lxor` - - Type conversions: `i2l`, `i2f`, `i2d`, `l2i`, `l2f`, `l2d`, `f2i`, `f2l`, `f2d`, `d2i`, `d2l`, `d2f`, `i2b`, `i2c`, `i2s` - - Comparisons: `lcmp`, `fcmpl`, `fcmpg`, `dcmpl`, `dcmpg` - - Control flow: `ifeq`, `ifne`, `iflt`, `ifge`, `ifgt`, `ifle`, `if_icmp*`, `if_acmp*`, `goto`, `ifnull`, `ifnonnull` - - Object operations: `new`, `newarray`, `anewarray`, `multianewarray`, `checkcast`, `instanceof` - - Field access: `getstatic`, `putstatic`, `getfield`, `putfield` - - Method invocation: `invokevirtual`, `invokespecial`, `invokestatic`, `invokeinterface` - - Returns: `ireturn`, `lreturn`, `freturn`, `dreturn`, `areturn`, `return` -- **Object Model**: Full object creation, field storage, and array support (primitive and reference arrays) -- **JNI Support**: Implementation of 80+ JNI functions for native method integration -- **Native Library Loading**: Dynamic loading of native libraries (DLLs on Windows) -- **Stack Traces**: Detailed stack trace generation with line number mapping from class file attributes -- **Module System**: Support for loading classes from 7z binary image archives (JDK modules) -- **Frame-based Execution**: Proper execution context with program counter, operand stack, and local variables - -### In Development - -- Additional bytecode instructions (`tableswitch`, `lookupswitch`, `monitorenter`, `monitorexit`, etc.) -- Exception handling (`athrow`, try/catch blocks) -- Garbage collection (basic object manager exists) -- Reflection API -- Multi-threading support -- Method handles and `invokedynamic` - -## Architecture - -### Core Components - -- **`Vm`** (`vm.rs`): Main virtual machine controller managing threads, class loader, and native library loading -- **`VmThread`** (`thread.rs`): Thread of execution managing the frame stack and method invocation -- **`Frame`** (`lib.rs`): Execution context for a method with PC, operand stack, and local variables -- **`ClassLoader`** (`class_loader.rs`): Handles dynamic class loading, linking, and initialization -- **`RuntimeClass`** (`class.rs`): Runtime representation of a loaded class with initialization state tracking -- **`ClassFile`** (`class_file/`): Binary parser for Java class files using the `deku` library -- **`ConstantPool`** (`class_file/constant_pool.rs`): Constant pool resolution and management -- **`ObjectManager`** (`objects/object_manager.rs`): Object allocation and garbage collection management -- **`JNI`** (`jni.rs`): Java Native Interface implementation - -### Execution Flow - -1. **Loading**: `ClassFile::from_bytes()` parses binary class file data -2. **Resolution**: `ClassLoader` converts `ClassFile` to `RuntimeClass`, resolving dependencies -3. **Initialization**: Class initializers (``) execute per JVM Spec 5.5 -4. **Execution**: `VmThread` invokes the main method, creating a `Frame` -5. **Interpretation**: `Frame` iterates through bytecode operations, executing each instruction -6. **Stack Operations**: Instructions manipulate the operand stack and local variables +- **Class File Parsing** - Full `.class` file support using deku for binary parsing +- **Bytecode Interpreter** - 200+ JVM instructions implemented +- **Object Model** - Objects, arrays, monitors, and string interning +- **JNI Support** - 250+ JNI functions for native method integration +- **Boot Image** - Load JDK classes from 7z module archives +- **Native FFI** - Dynamic library loading via libffi ## Project Structure ``` roast-vm/ -├── Cargo.toml # Workspace configuration -└── crates/ - ├── core/ # Main JVM implementation (roast-vm-core) - │ ├── Cargo.toml - │ └── src/ - │ ├── main.rs # Entry point (binary: roast) - │ ├── lib.rs # Frame and bytecode execution - │ ├── vm.rs # Virtual Machine controller - │ ├── thread.rs # Thread execution management - │ ├── class.rs # RuntimeClass definition - │ ├── class_loader.rs # ClassLoader implementation - │ ├── class_file/ # Binary class file parser - │ │ ├── class_file.rs # ClassFile parser (magic 0xCAFEBABE) - │ │ └── constant_pool.rs - │ ├── objects/ # Object model - │ │ ├── object.rs # Object representation - │ │ ├── array.rs # Array support - │ │ └── object_manager.rs - │ ├── jni.rs # JNI implementation - │ ├── instructions.rs # Bytecode opcode definitions - │ ├── attributes.rs # Class file attributes - │ ├── value.rs # Value and stack types - │ ├── error.rs # Error handling and stack traces - │ ├── native_libraries.rs # Native library management - │ └── bimage.rs # Binary image (7z) reader - │ - └── roast-vm-sys/ # Native methods bridge (cdylib) - ├── Cargo.toml - └── src/ - ├── lib.rs # Native method implementations - ├── system.rs # System native calls - ├── class.rs # Class native operations - └── object.rs # Object native operations +├── crates/ +│ ├── core/ # Main VM implementation +│ │ └── src/ +│ │ ├── main.rs # Entry point +│ │ ├── vm.rs # VM controller +│ │ ├── thread.rs # Thread execution +│ │ ├── class_loader.rs # Class loading +│ │ ├── bimage.rs # Boot image reader +│ │ ├── frame/ # Stack frames & interpreter +│ │ ├── class_file/ # Class file parser +│ │ ├── objects/ # Object/array model +│ │ └── native/ # JNI infrastructure +│ │ +│ └── roast-vm-sys/ # Native methods (cdylib) +│ └── src/ # JNI implementations +│ +├── lib/ # Boot image location +├── data/ # Default classpath +└── docs/ # Detailed documentation ``` -## Dependencies +## Documentation -- **`deku`**: Binary parsing and serialization for class files -- **`dashmap`**: Concurrent HashMap for class and object storage -- **`jni`**: Java Native Interface bindings -- **`libloading`**: Dynamic library loading -- **`libffi`**: Foreign function interface for native calls -- **`sevenz-rust2`**: 7z archive reading for module system support -- **`log`** / **`env_logger`**: Logging infrastructure -- **`itertools`**: Iterator utilities -- **`colored`**: Colored console output +Detailed implementation docs are in the `docs/` folder: + +- [Class File Parsing](docs/class-file-parsing.md) - Binary format, constant pool, attributes +- [Frame & Interpreter](docs/frame-interpreter.md) - Stack frames, opcode dispatch +- [Object Management](docs/object-management.md) - Objects, arrays, monitors +- [JNI](docs/jni.md) - JNIEnv structure, native invocation +- [Native/FFI](docs/native-ffi.md) - Library loading, libffi integration +- [roast-vm-sys](docs/roast-vm-sys.md) - Native method implementations +- [Class Loading](docs/class-loading.md) - Boot image, classpath, RuntimeClass ## Building ```bash -# Build the project cargo build - -# Build with optimizations cargo build --release - -# Run tests cargo test +``` -# Run with logging +## Running + +```bash +# Run with default classpath (./data) +cargo run + +# Run with custom classpath +cargo run -- /path/to/classes + +# With debug logging RUST_LOG=debug cargo run ``` -## Current Status +## Dependencies -This project is in early development (v0.1.0). The core infrastructure for class loading, bytecode execution, object creation, JNI support, and stack traces is functional. Many JVM features remain in development. +| Crate | Purpose | +|-------------------------------------------------------|---------------------------| +| [deku](https://crates.io/crates/deku) | Binary class file parsing | +| [dashmap](https://crates.io/crates/dashmap) | Concurrent maps | +| [jni](https://crates.io/crates/jni) | JNI type definitions | +| [libloading](https://crates.io/crates/libloading) | Dynamic library loading | +| [libffi](https://crates.io/crates/libffi) | Native function calls | +| [sevenz-rust2](https://crates.io/crates/sevenz-rust2) | Boot image archives | +| [parking_lot](https://crates.io/crates/parking_lot) | Synchronization | +## Status +Early development (v0.2.0). Core class loading, bytecode execution, and JNI are functional. Exception handling and GC +are in progress. + +**Vendor**: infernap12 ## References - [JVM Specification](https://docs.oracle.com/javase/specs/jvms/se25/html/index.html) -- [Java Class File Format](https://docs.oracle.com/javase/specs/jvms/se25/html/jvms-4.html) \ No newline at end of file +- [JNI Specification](https://docs.oracle.com/en/java/javase/25/docs/specs/jni/index.html) \ No newline at end of file diff --git a/crates/core/Cargo.toml b/crates/core/Cargo.toml index 969a4c5..ac91bc0 100644 --- a/crates/core/Cargo.toml +++ b/crates/core/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "roast-vm-core" -version = "0.1.5" +version = "0.2.0" edition = "2024" publish = ["nexus"] diff --git a/crates/roast-vm-sys/Cargo.toml b/crates/roast-vm-sys/Cargo.toml index 20068e7..8abb320 100644 --- a/crates/roast-vm-sys/Cargo.toml +++ b/crates/roast-vm-sys/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "roast-vm-sys" -version = "0.1.5" +version = "0.2.0" edition = "2024" publish = ["nexus"] diff --git a/docs/class-file-parsing.md b/docs/class-file-parsing.md new file mode 100644 index 0000000..5a882bc --- /dev/null +++ b/docs/class-file-parsing.md @@ -0,0 +1,156 @@ +# Class File Parsing + +**Location**: `crates/core/src/class_file/` + +The class file parser uses the **deku** library for declarative binary deserialization with automatic validation. + +## Components + +| File | Purpose | +|------|---------| +| `class_file.rs` | Main ClassFile struct with version, constant pool, fields, methods, attributes | +| `constant_pool.rs` | ConstantPoolGet/ConstantPoolExt traits for pool access and resolution | +| `attributes.rs` | Attribute parsing (Code, LineNumberTable, LocalVariableTable, BootstrapMethods) | +| `mod.rs` | Access flag definitions (ClassFlags, MethodFlags, FieldFlags) | + +## Key Types + +```rust +pub struct ClassFile { + pub minor_version: u16, + pub major_version: u16, + pub constant_pool: Arc, + pub access_flags: u16, + pub this_class: u16, + pub super_class: u16, + pub interfaces: Vec, + pub fields: Vec, + pub methods: Vec, + pub attributes: Vec, +} + +pub struct FieldInfo { + pub access_flags: u16, + pub name_index: u16, + pub descriptor_index: u16, + pub attributes: Vec, +} + +pub struct MethodInfo { + pub access_flags: u16, + pub name_index: u16, + pub descriptor_index: u16, + pub attributes: Vec, +} +``` + +## Constant Pool + +Trait-based architecture with two layers: + +### ConstantPoolGet Trait +Low-level accessors: +- `get_constant()`: Resolve by index (accounts for 64-bit entries) +- Type-specific getters: `get_i32()`, `get_utf8_info()`, `get_class_info()`, `get_method_ref()`, etc. +- Implemented via `pool_get_impl!` macro + +### ConstantPoolExt Trait +High-level operations: +- `get_string()`: Fetch UTF-8 strings with CESU-8 decoding +- `resolve_class_name()`: Trace class references through constant pool +- `resolve_method_ref()` / `resolve_interface_method_ref()`: Resolve method references +- `resolve_field()`: Resolve field references with type descriptors +- `parse_attribute()`: Convert raw attribute bytes to typed Attribute enum + +### Constant Pool Entry Types (20 types) + +```rust +pub enum ConstantPoolEntry { + Utf8(ConstantUtf8Info), + Integer(i32), Float(f32), Long(i64), Double(f64), + Class(ConstantClassInfo), + String(ConstantStringInfo), + FieldRef(ConstantFieldrefInfo), + MethodRef(ConstantMethodrefInfo), + InterfaceMethodRef(ConstantInterfaceMethodrefInfo), + NameAndType(ConstantNameAndTypeInfo), + MethodHandle(ConstantMethodHandleInfo), + MethodType(ConstantMethodTypeInfo), + Dynamic(ConstantDynamicInfo), + InvokeDynamic(ConstantInvokeDynamicInfo), + Module(ConstantModuleInfo), + Package(ConstantPackageInfo), +} +``` + +## Attributes + +Recursive attribute parsing with support for: + +- **Code**: Method bytecode with max_stack, max_locals, exception tables, nested attributes +- **LineNumberTable**: Maps bytecode offsets to source line numbers +- **LocalVariableTable**: Local variable debugging info (name, descriptor, PC range) +- **BootstrapMethods**: Dynamic invocation bootstrap method references +- **StackMapTable**, **Exceptions**, **InnerClasses**: Parsed as raw byte vectors +- **SourceFile**, **Signature**: Index-based attribute data +- **Unknown**: Fallback for unrecognized attributes + +### Code Attribute Structure + +```rust +pub struct CodeAttribute { + pub max_stack: u16, + pub max_locals: u16, + pub code_length: u32, + pub code: Vec, + pub exception_table: Vec, + pub attributes: Vec, // Recursive +} +``` + +## Access Flags + +Bitfield structures for parsing JVM access flags: + +- **ClassFlags**: PUBLIC, FINAL, INTERFACE, ABSTRACT, SYNTHETIC, ANNOTATION, ENUM, MODULE +- **FieldFlags**: PUBLIC, PRIVATE, PROTECTED, STATIC, FINAL, VOLATILE, TRANSIENT, SYNTHETIC, ENUM +- **MethodFlags**: PUBLIC, PRIVATE, PROTECTED, STATIC, FINAL, SYNCHRONIZED, BRIDGE, VARARGS, NATIVE, ABSTRACT, STRICT, SYNTHETIC + +## Validation + +Validation occurs at multiple levels: + +1. **Binary Format** (Automatic via Deku): + - Magic number (0xCAFEBABE) + - Big-endian byte order + - Type-safe parsing with error propagation + +2. **Constant Pool**: + - Index bounds checking in `get_constant()` + - Type validation: Each accessor checks the entry type matches expected type + - CESU-8 decoding errors caught from Java-style UTF-8 strings + +3. **Class Structure** (ClassLoader): + - Debug assertions for Object class having super_class = 0 + - Non-Object classes must have valid super class reference + - Interfaces must inherit from Object + +## Descriptor Parsing + +```rust +// Method descriptor: (II)I -> two ints, return int +MethodDescriptor::parse("(II)I")? + +// Field descriptor: Ljava/lang/String; -> String class type +FieldType::parse("Ljava/lang/String;")? + +// Array descriptor: [[I -> 2D int array +FieldType::parse("[[I")? +``` + +## Error Handling + +- `DekuError`: Binary parsing failures +- `ConstantPoolError`: Pool access with Generic, DescriptorParseError, Cesu8DecodingError variants +- `VmError`: Higher-level VM-specific errors +- `DescParseError`: Invalid method/field descriptor syntax \ No newline at end of file diff --git a/docs/class-loading.md b/docs/class-loading.md new file mode 100644 index 0000000..6da1100 --- /dev/null +++ b/docs/class-loading.md @@ -0,0 +1,134 @@ +# Class Loading and Boot Image + +**Location**: `crates/core/src/class_loader.rs`, `crates/core/src/bimage.rs` + +## Boot Image (Bimage) + +A 7z archive containing precompiled Java standard library classes. + +### Structure + +```rust +pub struct Bimage { + image: ArchiveReader, // 7z archive reader + modules: Vec, // Available modules + packages: HashMap, // Package -> Module mapping + pub total_access_time: Duration, // Performance tracking +} +``` + +- **Default Location**: `./lib/modules` +- **Format**: 7z compressed archive +- **Structure**: `/classes/.class` +- **Default Module**: `java.base` (used when no module is specified) + +## Class Loading Flow + +The `ClassLoader` manages class resolution with a two-tier fallback mechanism: + +### Process + +1. **Check Cache**: Look in `DashMap<(String, LoaderId), Arc>` for already-loaded classes +2. **Try Bimage**: Attempt to load from boot image via `bimage.get_class(module, class_fqn)` +3. **Fallback to Disk**: If not in bimage, load from filesystem at `{CLASSPATH}/{class_name}.class` +4. **Parse & Cache**: Parse ClassFile using deku, create RuntimeClass, store in cache + +### Key Method + +```rust +pub fn load_class(&mut self, what: &str, loader: LoaderId) -> Result, VmError> { + let bytes = self.bimage + .and_then(|b| b.get_class("", what).ok()) + .or_else(|_| Self::load_class_from_disk(what)) + .map_err(|_| VmError::LoaderError(...))?; + + let (_, cf) = ClassFile::from_bytes(bytes.as_ref())?; + let runtime = self.runtime_class(cf); + + // Store with loader ID for multi-loader support + self.classes.insert((class_fqn, loader), Arc::new(runtime)); +} +``` + +## Classpath Handling + +### Resolution Priority + +1. **Bimage (boot image)** - Primary source for standard library +2. **Command-line argument (arg[1])** - User-provided classpath +3. **Default `./data` directory** - Fallback location + +### Implementation + +```rust +fn load_class_from_disk(what: &str) -> Result, String> { + let class_path = std::env::args() + .nth(1) + .unwrap_or("./data".to_string()) + .replace("\\", "/"); + + let path = format!("{class_path}/{what}.class"); + // Load file from disk +} +``` + +## Bootstrap Process + +**Location**: `crates/core/src/vm.rs` - `boot_strap()` method + +### Steps + +1. **Create VM**: `ClassLoader::with_bimage()` - initializes with boot image +2. **Load Core Classes**: Preloads essential VM classes +3. **Create Primitive Classes**: Synthetic class objects for primitive types +4. **Initialize Classes**: Run static initializers (``) + +### Core Classes Loaded + +```rust +let classes = vec![ + "java/lang/String", + "java/lang/System", + "java/lang/Class", + "java/lang/Object", + "java/lang/Thread", + "java/lang/ThreadGroup", + "java/lang/Module", + "java/lang/reflect/Method", + // ... +]; +``` + +## RuntimeClass + +Runtime representation of a loaded class: + +### Cached Data + +- **Superclass Chain**: `super_classes: Vec>` +- **Interface Hierarchy**: `super_interfaces: Vec>` +- **Component Type**: For array classes, reference to element type +- **Initialization State**: Thread-safe `InitState` enum + +### Initialization States + +```rust +pub enum InitState { + NotInitialized, + Initializing(ThreadId), // Track which thread is initializing + Initialized, +} +``` + +### Method/Field Resolution + +- `find_method()` - Searches class then walks up superclass chain +- `find_field()` - Same recursive behavior for fields +- `is_assignable_into()` - Checks type compatibility with array covariance + +## Multi-Loader Support + +Classes are keyed by `(class_name, LoaderId)` tuple to support: +- Different class loaders loading same-named classes +- Isolation between class loader namespaces +- Proper class identity checks \ No newline at end of file diff --git a/docs/frame-interpreter.md b/docs/frame-interpreter.md new file mode 100644 index 0000000..a27f5bd --- /dev/null +++ b/docs/frame-interpreter.md @@ -0,0 +1,141 @@ +# Frame and Bytecode Interpreter + +**Location**: `crates/core/src/frame/` + +## Frame Structure + +Each method invocation creates a `Frame` containing: + +| Component | Description | +|-----------|-------------| +| Program Counter (PC) | i64 tracking current bytecode instruction | +| Operand Stack | Generic Vec-backed stack for intermediate values | +| Local Variables | Indexed slots, handles wide values (long/double occupy 2 slots) | +| Constant Pool | Arc reference to the class constant pool | +| Bytecode | Instructions for the method | + +### OperandStack (`operand_stack.rs`) + +- Generic Vec-backed stack with push/pop/peek operations +- `pop_n(n)` returns values in push order (not pop order) for method arguments +- Supports underflow detection + +### LocalVariables (`local_vars.rs`) + +- Vec-backed, indexed by slot +- Handles wide values (long, double) that occupy 2 slots with padding +- `from_args()` automatically spaces wide values correctly +- Prevents access to padding slots with runtime panic + +## Execution Loop + +```rust +loop { + let (offset, op) = self.next().unwrap(); + self.pc = offset as i64; + let result = self.execute_instruction(op.clone()); + match result { + Ok(ExecutionResult::Advance(offset)) => self.pc += offset as i64, + Ok(_) => self.pc += 1, + Err(x) => return error with stack trace, + } +} +``` + +## Opcode Dispatch + +**Location**: `frame.rs` `execute_instruction()` (lines 199-1516) + +- Single match statement over `Ops` enum variants (defined in `instructions.rs`) +- 200+ opcodes: constants, loads/stores, math, stack ops, branches, references, method invocation +- Uses helper macros (`load!`, `store!`, `binary_op!`, `shift_op!`, etc.) for common patterns +- Each opcode returns one of: + - `ExecutionResult::Continue` (auto-increment PC by 1) + - `ExecutionResult::Advance(offset)` (jump) + - `ExecutionResult::Return(())` or `ExecutionResult::ReturnValue(Value)` (exit frame) + - `VmError` on failure + +## Opcode Encoding (`instructions.rs`) + +- Uses `deku` derive for binary deserialization +- Each opcode has a u8 ID (0x00-0xFF) +- Some opcodes carry operands (e.g., `iload(u8)`, `goto(i16)`) +- Wide instruction prefix (0xC4) for accessing local slots > 255 + +## Method Invocation + +**Location**: `thread.rs` (lines 308-338) + +### Invocation Types + +1. **`invoke()`**: Resolve method by class and descriptor, execute it +2. **`invoke_virtual()`**: Virtual dispatch - find method on actual runtime class +3. **`invoke_native()`**: Call native JNI method via FFI + +### Bytecode Invocation Instructions + +- **`invokevirtual`**: Virtual method dispatch - pop receiver + arguments, get actual class, call `thread.invoke_virtual()` +- **`invokespecial`**: Non-virtual (constructors, private, super) - pop receiver + arguments, call `thread.invoke()` with static resolution +- **`invokestatic`**: Static methods - pop arguments only (no receiver), call `thread.invoke()` +- **`invokeinterface`**: Interface method dispatch - similar to `invokevirtual` with interface resolution + +### Frame Creation & Execution + +```rust +fn execute_method(&self, class: &Arc, method: &MethodData, args: Vec) { + let mut frame = Frame::new( + class.clone(), + method_ref, + code_attr, // Contains max_stack, max_locals, bytecode + args, // Initialize local vars with parameters + ... + ); + self.frame_stack.lock().push(frame.clone()); + frame.execute() // Bytecode interpretation loop + self.frame_stack.lock().pop(); +} +``` + +## Supported Instructions + +### Constants +`aconst_null`, `iconst_*`, `lconst_*`, `fconst_*`, `dconst_*`, `bipush`, `sipush`, `ldc`, `ldc_w`, `ldc2_w` + +### Load/Store +`iload`, `lload`, `fload`, `dload`, `aload`, `istore`, `lstore`, `fstore`, `dstore`, `astore` (including `_0-3` variants) + +### Array Operations +`iaload`, `laload`, `faload`, `daload`, `aaload`, `baload`, `caload`, `saload`, `iastore`, `lastore`, `fastore`, `dastore`, `aastore`, `bastore`, `castore`, `sastore`, `arraylength` + +### Stack Manipulation +`pop`, `pop2`, `dup`, `dup_x1`, `dup_x2`, `dup2`, `dup2_x1`, `dup2_x2`, `swap` + +### Arithmetic +All int/long/float/double add, sub, mul, div, rem, neg operations + +### Bitwise +`ishl`, `lshl`, `ishr`, `lshr`, `iushr`, `lushr`, `iand`, `land`, `ior`, `lor`, `ixor`, `lxor` + +### Type Conversions +All primitive type conversions (`i2l`, `l2i`, `f2d`, etc.) + +### Comparisons +`lcmp`, `fcmpl`, `fcmpg`, `dcmpl`, `dcmpg` + +### Control Flow +`ifeq`, `ifne`, `iflt`, `ifge`, `ifgt`, `ifle`, `if_icmp*`, `if_acmp*`, `goto`, `ifnull`, `ifnonnull`, `tableswitch`, `lookupswitch` + +### Object Operations +`new`, `newarray`, `anewarray`, `multianewarray`, `checkcast`, `instanceof` + +### Field Access +`getstatic`, `putstatic`, `getfield`, `putfield` + +### Method Invocation +`invokevirtual`, `invokespecial`, `invokestatic`, `invokeinterface` + +### Returns +`ireturn`, `lreturn`, `freturn`, `dreturn`, `areturn`, `return` + +### Synchronization +`monitorenter`, `monitorexit` \ No newline at end of file diff --git a/docs/jni.md b/docs/jni.md new file mode 100644 index 0000000..eb7842c --- /dev/null +++ b/docs/jni.md @@ -0,0 +1,155 @@ +# JNI Implementation + +**Location**: `crates/core/src/native/jni.rs` + +## JNIEnv Structure + +The JNIEnv is created as a function table (`JNINativeInterface_` from the `jni` crate) with 250+ function pointers: + +```rust +pub fn create_jni_function_table(thread: *const VmThread) -> JNIEnv { + Box::into_raw(Box::new(JNINativeInterface_ { + reserved0: thread as *mut _, // Stores pointer to VmThread for context + reserved1: std::ptr::null_mut(), + reserved2: std::ptr::null_mut(), + reserved3: std::ptr::null_mut(), + GetVersion: Some(jni_get_version), + FindClass: Some(find_class), + RegisterNatives: Some(register_natives), + GetMethodID: Some(get_method_id), +// ... 240+ more function pointers + })) +} +``` + +**Key Feature:** The `reserved0` field stores a pointer to the `VmThread`, allowing each JNI function to access thread +context via `get_thread(env)`. + +## VmThread's JNIEnv Storage + +**Location**: `crates/core/src/thread.rs` + +Each thread owns its JNIEnv: + +```rust +pub struct VmThread { + pub id: ThreadId, + pub vm: Arc, + pub loader: Arc>, + pub jni_env: JNIEnv, // Stored per-thread + // ... other fields +} + +// Created during VmThread initialization: +let jni_env = create_jni_function_table(weak_self.as_ptr() as * mut VmThread); +``` + +## Native Method Invocation + +**Location**: `crates/core/src/thread.rs` (lines 340-428) + +The flow is: + +1. **Method Detection:** When a method has `ACC_NATIVE` flag, `invoke_native()` is called +2. **Symbol Resolution:** Generates JNI symbol name (e.g., `Java_java_lang_String_intern`) +3. **Lookup:** Searches registered native methods or loaded native libraries via `find_native_method()` +4. **FFI Call:** Uses `libffi` to call the native function with constructed arguments + +```rust +pub fn invoke_native(&self, method: &MethodRef, args: Vec) -> MethodCallResult { + let symbol_name = generate_jni_method_name(method, false); + + // Find the function pointer + let p = self.vm.find_native_method(&symbol_name) + .ok_or(VmError::NativeError(...))?; + + // Build Call Interface (CIF) for FFI + let cp = CodePtr::from_ptr(p); + let built_args = build_args(args, &mut storage, &self.jni_env as *mut JNIEnv); + let cif = method.build_cif(); + + // Invoke with type-specific call + match &method.desc.return_type { + None => { + cif.call::<()>(cp, built_args.as_ref()); + Ok(None) + } + Some(FieldType::Base(BaseType::Int)) => { + let v = cif.call::(cp, built_args.as_ref()); + Ok(Some(v.into())) + } + // ... handle other return types + } +} +``` + +## Argument Marshalling + +**Location**: `crates/core/src/thread.rs` (lines 509-548) + +Native functions receive: + +1. `JNIEnv*` - pointer to the function table +2. `jclass` or `jobject` (receiver) - always an ID (u32) +3. Parameters - converted from VM `Value` types to JNI types + +```rust +fn build_args(mut params: VecDeque, storage: &mut Vec>, + jnienv: *mut JNIEnv) -> Vec { + storage.push(Box::new(jnienv)); // Slot 0: JNIEnv* + let receiver_id = params.pop_front().map(...); + storage.push(Box::new(receiver_id as jobject)); // Slot 1: this/class + + for value in params { + match value { + Value::Primitive(Primitive::Int(x)) => storage.push(Box::new(x)), + Value::Reference(Some(ref_kind)) => { + storage.push(Box::new(ref_kind.id() as jobject)) // References as IDs + } + // ... other types + } + } + storage.iter().map(|boxed| arg(&**boxed)).collect() +} +``` + +## Native Method Registration + +**Location**: `crates/core/src/native/jni.rs` (lines 381-442) + +Java code calls `RegisterNatives()` JNI function, which stores pointers: + +```rust +unsafe extern "system" fn register_natives( + env: *mut JNIEnv, + clazz: jclass, + methods: *const JNINativeMethod, + n_methods: jint, +) -> jint { + let thread = &*get_thread(env); + + for i in 0..n_methods as usize { + let native_method = &*methods.add(i); + let full_name = generate_jni_short_name(&class_name, name); + + thread.vm.native_methods.insert(full_name, native_method.fnPtr); + } + JNI_OK +} +``` + +## Implemented JNI Functions + +| Category | Functions | +|-------------------|--------------------------------------------------------------| +| Version | `GetVersion` | +| Class Operations | `FindClass`, `GetSuperclass`, `IsAssignableFrom` | +| Exceptions | `Throw`, `ThrowNew`, `ExceptionOccurred`, `ExceptionClear` | +| References | `NewGlobalRef`, `DeleteGlobalRef`, `NewLocalRef` | +| Object Operations | `AllocObject`, `NewObject`, `GetObjectClass`, `IsInstanceOf` | +| Field Access | `GetFieldID`, `Get/SetField`, `GetStaticFieldID` | +| Method Invocation | `GetMethodID`, `CallMethod`, `CallStaticMethod` | +| String Operations | `NewString`, `GetStringLength`, `GetStringChars` | +| Array Operations | `NewArray`, `GetArrayLength`, `Get/SetArrayRegion` | +| Registration | `RegisterNatives`, `UnregisterNatives` | +| Monitors | `MonitorEnter`, `MonitorExit` | \ No newline at end of file diff --git a/docs/native-ffi.md b/docs/native-ffi.md new file mode 100644 index 0000000..75c2b3a --- /dev/null +++ b/docs/native-ffi.md @@ -0,0 +1,125 @@ +# Native Methods and FFI System + +**Location**: `crates/core/src/thread.rs`, `crates/core/src/native/` + +## Library Loading + +Native libraries are loaded dynamically using `libloading`: + +- **Location**: `crates/core/src/main.rs` +- **Supported platforms**: Windows (.dll), Linux (.so) +- **Libraries loaded**: + - `roast_vm` - VM-specific native methods + - `jvm` - Java virtual machine standard library + - `java` - Java standard library + +Libraries are registered with the VM via `Vm::load_native_library()` and stored in `native_libraries: Arc>>`. + +## Native Method Registration + +Native methods are registered through JNI's `RegisterNatives` function: + +- **Location**: `crates/core/src/native/jni.rs` +- **Process**: + 1. Java code calls `RegisterNatives()` with method names, signatures, and function pointers + 2. VM generates JNI-formatted symbol names using `generate_jni_method_name()` + 3. Function pointers are stored in `Vm::native_methods: DashMap` + 4. Filters prevent registration of certain methods (e.g., Thread operations) + +## Native Method Dispatch + +When a native method is invoked: + +- **Detection**: `MethodData::ACC_NATIVE` flag identifies native methods +- **Location**: `crates/core/src/thread.rs` +- **Flow**: + 1. `execute_method()` checks if method has `ACC_NATIVE` flag + 2. If static, adds class reference to args; otherwise adds instance reference + 3. Calls `invoke_native()` with method reference and arguments + +## FFI System - libffi Integration + +libffi is used to call native functions with correct calling conventions: + +### Call Interface Building + +```rust +fn build_cif(&self) -> Cif { + let mut args = vec![ + Type::pointer(), // JNIEnv* + Type::pointer(), // jclass/jobject + ]; + for v in self.desc.parameters { + args.push(v.into()) + } + let return_type = ...; + Builder::new().args(args).res(return_type).into_cif() +} +``` + +- Constructs a Call Interface (Cif) from method signature +- Maps Java types to FFI types +- Always adds JNIEnv* and jclass/jobject as first two parameters + +### Argument Marshalling + +```rust +fn build_args(params, storage, jnienv) -> Vec +``` + +- Marshals Java values to native format +- Converts references to object IDs (u32) +- Boxes primitives for FFI passing +- Stores all values in a temporary vector + +### Function Invocation + +```rust +let cp = CodePtr::from_ptr(p); +cif.call::(cp, built_args.as_ref()); +``` + +- Converts function pointer to CodePtr +- Calls through libffi with correct return type handling + +## Type Mapping + +| Java Type | FFI Type | +|-----------|----------| +| byte | i8 | +| char | u16 | +| short | i16 | +| int | i32 | +| long | i64 | +| float | f32 | +| double | f64 | +| boolean | i8 | +| Object/Array | pointer | + +## JNI Function Table + +A complete JNI function table is created and passed to native code: + +- **Location**: `crates/core/src/native/jni.rs` +- **Implementation**: + - 250+ JNI functions defined as unsafe extern "system" functions + - Covers class operations, method invocation, field access, array operations, string handling + - Many functions are stubs returning `todo!()` for unimplemented features + +## Unsafe Support (sun.misc.Unsafe) + +Low-level unsafe operations are tracked: + +- **Location**: `crates/core/src/native/unsafe.rs` +- **Features**: + - Field offset registry mapping to class/field pairs + - Off-heap memory allocation tracking + - Base offset constant: `0x1_0000_0000` + +## Key Implementation Characteristics + +- **Thread-safe**: All structures use `DashMap`, `RwLock`, or `Mutex` +- **JNI environment**: Created per-thread, stored in `VmThread::jni_env` +- **Symbol lookup**: Two-pass search (without params, then with params) +- **Error handling**: Returns `VmError::NativeError` if symbols not found +- **Tracking**: Maintains statistics on library resolution counts \ No newline at end of file diff --git a/docs/object-management.md b/docs/object-management.md new file mode 100644 index 0000000..6a84680 --- /dev/null +++ b/docs/object-management.md @@ -0,0 +1,95 @@ +# Object Management + +**Location**: `crates/core/src/objects/` + +## Object Representation + +Java objects are represented through a multi-layered abstraction: + +```rust +pub struct Object { + pub id: u32, // Unique identifier + pub class: Arc, // Runtime class reference + pub fields: DashMap, // Concurrent field storage +} + +pub type ObjectReference = Arc>; +``` + +- **Objects**: Contain a unique ID (u32), runtime class reference, and field storage (DashMap) +- **ObjectReference**: `Arc>` - reference-counted, thread-safe smart pointer +- **Value wrapper**: The `Value` enum encapsulates both primitives and references for operand stack/local variable storage + +## Array Management + +Arrays are type-safe with separate variants for primitives and objects: + +```rust +pub enum ArrayReference { + Int(Arc>>), + Byte(Arc>>), + Short(Arc>>), + Long(Arc>>), + Float(Arc>>), + Double(Arc>>), + Char(Arc>>), + Boolean(Arc>>), + Object(Arc>>>), +} +``` + +- **Primitive arrays**: Int, Byte, Short, Long, Float, Double, Char, Boolean +- **Object arrays**: Can hold references to other objects +- **Array structure**: Each array wraps a boxed slice `Box<[T]>` with id, class, and backing storage +- **Thread-safe**: All arrays use `Arc>>` for concurrent access + +## Allocation Strategy + +Allocation is centralized in **ObjectManager**: + +- **Object allocation**: `new_object()` generates unique IDs via atomic counter and stores references in HashMap +- **Array allocation**: Separate methods for primitive arrays (`new_primitive_array()`) and object arrays (`new_object_array()`) +- **String interning**: `new_string()` creates UTF-16 encoded strings with automatic interning via string pool +- **Memory tracking**: `bytes_in_use()` calculates total heap usage across all objects/arrays + +All allocated objects are registered in `objects: HashMap` for global access. + +## Reference Handling + +Two-level reference system: + +1. **ReferenceKind enum**: Distinguishes between `ObjectReference` and `ArrayReference` +2. **Reference type alias**: `Option` (None = null) +3. **Conversion methods**: Safe conversions with `try_into_object_reference()` and `try_into_array_reference()` + +## Memory Management + +**No explicit garbage collection** - relies on Rust's reference counting: + +- Arc ensures objects live as long as references exist +- Mutex provides thread-safe field/element access +- Shallow cloning for `clone()` operations (copies references, not objects) +- Array copy operations handle both primitive and object types with bounds checking + +## Object Synchronization + +Monitor-based concurrency for synchronized operations: + +```rust +pub struct Monitor { + owner: Option, + entry_count: u32, + condition: Condvar, + mutex: Mutex<()>, +} +``` + +- **Operations**: `monitor_enter()`, `monitor_exit()`, `wait()`, `notify_one()`, `notify_all()` +- **Wait semantics**: Full support for Java-style wait/notify with timeout +- **Reentrant**: Same thread can enter multiple times (tracked by entry_count) + +## Special Features + +- **String handling**: UTF-16 LE encoding with automatic String object creation +- **Reflection support**: Methods to create Constructor, Method, MethodHandle objects +- **Class mirrors**: Every class has an associated mirror object (java/lang/Class) \ No newline at end of file diff --git a/docs/roast-vm-sys.md b/docs/roast-vm-sys.md new file mode 100644 index 0000000..90c34f7 --- /dev/null +++ b/docs/roast-vm-sys.md @@ -0,0 +1,87 @@ +# roast-vm-sys Crate + +**Location**: `crates/roast-vm-sys/` + +A cdylib crate that exports native method implementations callable from Java via JNI. + +## Overview + +**roast-vm-sys** is a JNI wrapper crate that exposes the roast-vm-core runtime to Java. It's compiled as a C dynamic library (cdylib) named `roast_vm`. + +## Exported Native Methods + +The crate exports 40+ JNI native functions via `#[no_mangle] extern "system"` declarations. + +### By Module + +| Module | Functions | +|--------|-----------| +| `thread.rs` | `Thread.currentThread()`, `Thread.start0()`, `Thread.setPriority0()` | +| `object.rs` | `Object.hashCode()`, `Object.clone()`, `Object.notify()`, `Object.notifyAll()`, `Object.wait()` | +| `class.rs` | `Class.forName0()`, `Class.getPrimitiveClass()`, `Class.getDeclaredConstructors0()` | +| `reflection.rs` | `Reflection.getCallerClass()` | +| `reflect/array.rs` | `Array.newArray()` | +| `string.rs` | `String.intern()` | +| `system.rs` | `System.arraycopy()`, `System.nanoTime()` | +| `runtime.rs` | `Runtime.maxMemory()`, `Runtime.availableProcessors()` | +| `misc_unsafe.rs` | `Unsafe` field offsets, volatile read/write, memory allocation | +| `file_output_stream.rs` | `FileOutputStream.writeBytes()` | +| `system_props.rs` | `vmProperties()` - VM identity (version 0.1.0, vendor "infernap12") | +| `CDS.rs` | Class Data Sharing stubs | +| `signal.rs` | `Signal.handle0()` stub | +| `scoped_memory_access.rs` | `ScopedMemoryAccess` registration | + +## Bridge Pattern + +Each native function follows this pattern: + +1. **Extract VmThread** from `JNIEnv.reserved0` using `get_thread()` helper +2. **Resolve References** - Convert JNI handles (jobject) to internal references: + - `resolve_object()` - gets ObjectReference + - `resolve_array()` - gets ArrayReference + - `resolve_reference()` - gets generic ReferenceKind +3. **Perform Operation** via core VM APIs +4. **Return Result** in JNI-compatible format + +## Example Native Implementation + +```rust +#[unsafe(no_mangle)] +pub extern "system" fn Java_org_example_MockIO_print( + env: JNIEnv, + _jclass: JClass, + input: JString, +) { + unsafe { + let input: String = env.get_string_unchecked(&input) + .expect("Couldn't get java string!") + .into(); + std::io::stdout().write_all(input.as_bytes()).ok(); + } +} +``` + +## File Structure + +17 modules organized by Java class: + +- `lib.rs` - Core helpers, test functions (`MockIO.print()`, `Main.getTime()`) +- `runtime.rs`, `thread.rs`, `class.rs` - Core VM operations +- `object.rs`, `string.rs`, `reflection.rs`, `reflect/` - Object/class introspection +- `system.rs`, `file_output_stream.rs` - System I/O +- `misc_unsafe.rs` - Unsafe memory operations (largest implementation, ~626 lines) +- `CDS.rs`, `signal.rs`, `system_props.rs`, `scoped_memory_access.rs` - Stubs/properties + +## GC Interaction + +Native methods access the garbage collector via: +- `thread.gc.read()` / `thread.gc.write()` for object access +- Object creation, cloning, and array operations go through GC +- Field access via `thread.gc` or direct field references + +## Error Handling + +Mixed approach: +- Some methods panic on errors +- Some return null/default values +- TODO comments indicate incomplete exception throwing