Update documentation for class file parsing, class loading, frame interpreter, JNI, object management, and FFI. Increment crate version to 0.2.0;

This commit is contained in:
james 2025-12-26 18:10:52 +10:30
parent 7fcf00b77f
commit 24939df1b7
No known key found for this signature in database
GPG Key ID: E1FFBA228F4CAD87
10 changed files with 960 additions and 114 deletions

177
README.md
View File

@ -1,146 +1,99 @@
# RoastVM
A Java Virtual Machine (JVM) implementation written in Rust, capable of parsing and executing Java class files and bytecode.
A Java Virtual Machine (JVM) implementation written in Rust.
## Overview
RoastVM is an educational/experimental JVM implementation that demonstrates the core components and execution model of the Java Virtual Machine. The project uses Rust's type safety and modern tooling to build a simplified but functional JVM interpreter.
RoastVM is an educational/experimental JVM implementation that executes Java bytecode. It supports class file parsing,
bytecode interpretation, JNI native methods, and includes a boot image system for loading the Java standard library.
## Features
### Currently Implemented
- **Class File Parsing**: Full support for reading and deserializing binary Java class files (`.class`) with magic number `0xCAFEBABE`
- **Constant Pool Management**: Handles 20+ constant pool entry types (UTF8, Integer, Float, Long, Double, Class, String, MethodRef, FieldRef, InterfaceMethodRef, NameAndType, MethodHandle, MethodType, InvokeDynamic, etc.)
- **Dynamic Class Loading**: On-demand class loading with superclass and interface resolution, caching via DashMap
- **Class Initialization**: Automatic `<clinit>` method execution following JVM Spec 5.5, with recursive initialization tracking
- **Bytecode Execution**: Interpreter for 50+ JVM bytecode instructions including:
- Constants: `aconst_null`, `iconst_*`, `lconst_*`, `fconst_*`, `dconst_*`, `bipush`, `sipush`, `ldc`, `ldc_w`, `ldc2_w`
- Load/Store: `iload`, `lload`, `fload`, `dload`, `aload`, `istore`, `lstore`, `fstore`, `dstore`, `astore` (including `_0-3` variants)
- Array operations: `iaload`, `laload`, `faload`, `daload`, `aaload`, `baload`, `caload`, `saload`, `iastore`, `lastore`, `fastore`, `dastore`, `aastore`, `bastore`, `castore`, `sastore`, `arraylength`
- Stack manipulation: `pop`, `pop2`, `dup`, `dup_x1`, `dup_x2`, `dup2`, `dup2_x1`, `dup2_x2`, `swap`
- Arithmetic: `iadd`, `ladd`, `fadd`, `dadd`, `isub`, `lsub`, `fsub`, `dsub`, `imul`, `lmul`, `fmul`, `dmul`, `idiv`, `ldiv`, `fdiv`, `ddiv`, `irem`, `lrem`, `frem`, `drem`, `ineg`, `lneg`, `fneg`, `dneg`
- Bitwise: `ishl`, `lshl`, `ishr`, `lshr`, `iushr`, `lushr`, `iand`, `land`, `ior`, `lor`, `ixor`, `lxor`
- Type conversions: `i2l`, `i2f`, `i2d`, `l2i`, `l2f`, `l2d`, `f2i`, `f2l`, `f2d`, `d2i`, `d2l`, `d2f`, `i2b`, `i2c`, `i2s`
- Comparisons: `lcmp`, `fcmpl`, `fcmpg`, `dcmpl`, `dcmpg`
- Control flow: `ifeq`, `ifne`, `iflt`, `ifge`, `ifgt`, `ifle`, `if_icmp*`, `if_acmp*`, `goto`, `ifnull`, `ifnonnull`
- Object operations: `new`, `newarray`, `anewarray`, `multianewarray`, `checkcast`, `instanceof`
- Field access: `getstatic`, `putstatic`, `getfield`, `putfield`
- Method invocation: `invokevirtual`, `invokespecial`, `invokestatic`, `invokeinterface`
- Returns: `ireturn`, `lreturn`, `freturn`, `dreturn`, `areturn`, `return`
- **Object Model**: Full object creation, field storage, and array support (primitive and reference arrays)
- **JNI Support**: Implementation of 80+ JNI functions for native method integration
- **Native Library Loading**: Dynamic loading of native libraries (DLLs on Windows)
- **Stack Traces**: Detailed stack trace generation with line number mapping from class file attributes
- **Module System**: Support for loading classes from 7z binary image archives (JDK modules)
- **Frame-based Execution**: Proper execution context with program counter, operand stack, and local variables
### In Development
- Additional bytecode instructions (`tableswitch`, `lookupswitch`, `monitorenter`, `monitorexit`, etc.)
- Exception handling (`athrow`, try/catch blocks)
- Garbage collection (basic object manager exists)
- Reflection API
- Multi-threading support
- Method handles and `invokedynamic`
## Architecture
### Core Components
- **`Vm`** (`vm.rs`): Main virtual machine controller managing threads, class loader, and native library loading
- **`VmThread`** (`thread.rs`): Thread of execution managing the frame stack and method invocation
- **`Frame`** (`lib.rs`): Execution context for a method with PC, operand stack, and local variables
- **`ClassLoader`** (`class_loader.rs`): Handles dynamic class loading, linking, and initialization
- **`RuntimeClass`** (`class.rs`): Runtime representation of a loaded class with initialization state tracking
- **`ClassFile`** (`class_file/`): Binary parser for Java class files using the `deku` library
- **`ConstantPool`** (`class_file/constant_pool.rs`): Constant pool resolution and management
- **`ObjectManager`** (`objects/object_manager.rs`): Object allocation and garbage collection management
- **`JNI`** (`jni.rs`): Java Native Interface implementation
### Execution Flow
1. **Loading**: `ClassFile::from_bytes()` parses binary class file data
2. **Resolution**: `ClassLoader` converts `ClassFile` to `RuntimeClass`, resolving dependencies
3. **Initialization**: Class initializers (`<clinit>`) execute per JVM Spec 5.5
4. **Execution**: `VmThread` invokes the main method, creating a `Frame`
5. **Interpretation**: `Frame` iterates through bytecode operations, executing each instruction
6. **Stack Operations**: Instructions manipulate the operand stack and local variables
- **Class File Parsing** - Full `.class` file support using deku for binary parsing
- **Bytecode Interpreter** - 200+ JVM instructions implemented
- **Object Model** - Objects, arrays, monitors, and string interning
- **JNI Support** - 250+ JNI functions for native method integration
- **Boot Image** - Load JDK classes from 7z module archives
- **Native FFI** - Dynamic library loading via libffi
## Project Structure
```
roast-vm/
├── Cargo.toml # Workspace configuration
└── crates/
├── core/ # Main JVM implementation (roast-vm-core)
│ ├── Cargo.toml
│ └── src/
│ ├── main.rs # Entry point (binary: roast)
│ ├── lib.rs # Frame and bytecode execution
│ ├── vm.rs # Virtual Machine controller
│ ├── thread.rs # Thread execution management
│ ├── class.rs # RuntimeClass definition
│ ├── class_loader.rs # ClassLoader implementation
│ ├── class_file/ # Binary class file parser
│ │ ├── class_file.rs # ClassFile parser (magic 0xCAFEBABE)
│ │ └── constant_pool.rs
│ ├── objects/ # Object model
│ │ ├── object.rs # Object representation
│ │ ├── array.rs # Array support
│ │ └── object_manager.rs
│ ├── jni.rs # JNI implementation
│ ├── instructions.rs # Bytecode opcode definitions
│ ├── attributes.rs # Class file attributes
│ ├── value.rs # Value and stack types
│ ├── error.rs # Error handling and stack traces
│ ├── native_libraries.rs # Native library management
│ └── bimage.rs # Binary image (7z) reader
└── roast-vm-sys/ # Native methods bridge (cdylib)
├── Cargo.toml
└── src/
├── lib.rs # Native method implementations
├── system.rs # System native calls
├── class.rs # Class native operations
└── object.rs # Object native operations
├── crates/
│ ├── core/ # Main VM implementation
│ │ └── src/
│ │ ├── main.rs # Entry point
│ │ ├── vm.rs # VM controller
│ │ ├── thread.rs # Thread execution
│ │ ├── class_loader.rs # Class loading
│ │ ├── bimage.rs # Boot image reader
│ │ ├── frame/ # Stack frames & interpreter
│ │ ├── class_file/ # Class file parser
│ │ ├── objects/ # Object/array model
│ │ └── native/ # JNI infrastructure
│ │
│ └── roast-vm-sys/ # Native methods (cdylib)
│ └── src/ # JNI implementations
├── lib/ # Boot image location
├── data/ # Default classpath
└── docs/ # Detailed documentation
```
## Dependencies
## Documentation
- **`deku`**: Binary parsing and serialization for class files
- **`dashmap`**: Concurrent HashMap for class and object storage
- **`jni`**: Java Native Interface bindings
- **`libloading`**: Dynamic library loading
- **`libffi`**: Foreign function interface for native calls
- **`sevenz-rust2`**: 7z archive reading for module system support
- **`log`** / **`env_logger`**: Logging infrastructure
- **`itertools`**: Iterator utilities
- **`colored`**: Colored console output
Detailed implementation docs are in the `docs/` folder:
- [Class File Parsing](docs/class-file-parsing.md) - Binary format, constant pool, attributes
- [Frame & Interpreter](docs/frame-interpreter.md) - Stack frames, opcode dispatch
- [Object Management](docs/object-management.md) - Objects, arrays, monitors
- [JNI](docs/jni.md) - JNIEnv structure, native invocation
- [Native/FFI](docs/native-ffi.md) - Library loading, libffi integration
- [roast-vm-sys](docs/roast-vm-sys.md) - Native method implementations
- [Class Loading](docs/class-loading.md) - Boot image, classpath, RuntimeClass
## Building
```bash
# Build the project
cargo build
# Build with optimizations
cargo build --release
# Run tests
cargo test
```
# Run with logging
## Running
```bash
# Run with default classpath (./data)
cargo run
# Run with custom classpath
cargo run -- /path/to/classes
# With debug logging
RUST_LOG=debug cargo run
```
## Current Status
## Dependencies
This project is in early development (v0.1.0). The core infrastructure for class loading, bytecode execution, object creation, JNI support, and stack traces is functional. Many JVM features remain in development.
| Crate | Purpose |
|-------------------------------------------------------|---------------------------|
| [deku](https://crates.io/crates/deku) | Binary class file parsing |
| [dashmap](https://crates.io/crates/dashmap) | Concurrent maps |
| [jni](https://crates.io/crates/jni) | JNI type definitions |
| [libloading](https://crates.io/crates/libloading) | Dynamic library loading |
| [libffi](https://crates.io/crates/libffi) | Native function calls |
| [sevenz-rust2](https://crates.io/crates/sevenz-rust2) | Boot image archives |
| [parking_lot](https://crates.io/crates/parking_lot) | Synchronization |
## Status
Early development (v0.2.0). Core class loading, bytecode execution, and JNI are functional. Exception handling and GC
are in progress.
**Vendor**: infernap12
## References
- [JVM Specification](https://docs.oracle.com/javase/specs/jvms/se25/html/index.html)
- [Java Class File Format](https://docs.oracle.com/javase/specs/jvms/se25/html/jvms-4.html)
- [JNI Specification](https://docs.oracle.com/en/java/javase/25/docs/specs/jni/index.html)

View File

@ -1,6 +1,6 @@
[package]
name = "roast-vm-core"
version = "0.1.5"
version = "0.2.0"
edition = "2024"
publish = ["nexus"]

View File

@ -1,6 +1,6 @@
[package]
name = "roast-vm-sys"
version = "0.1.5"
version = "0.2.0"
edition = "2024"
publish = ["nexus"]

156
docs/class-file-parsing.md Normal file
View File

@ -0,0 +1,156 @@
# Class File Parsing
**Location**: `crates/core/src/class_file/`
The class file parser uses the **deku** library for declarative binary deserialization with automatic validation.
## Components
| File | Purpose |
|------|---------|
| `class_file.rs` | Main ClassFile struct with version, constant pool, fields, methods, attributes |
| `constant_pool.rs` | ConstantPoolGet/ConstantPoolExt traits for pool access and resolution |
| `attributes.rs` | Attribute parsing (Code, LineNumberTable, LocalVariableTable, BootstrapMethods) |
| `mod.rs` | Access flag definitions (ClassFlags, MethodFlags, FieldFlags) |
## Key Types
```rust
pub struct ClassFile {
pub minor_version: u16,
pub major_version: u16,
pub constant_pool: Arc<ConstantPoolOwned>,
pub access_flags: u16,
pub this_class: u16,
pub super_class: u16,
pub interfaces: Vec<u16>,
pub fields: Vec<FieldInfo>,
pub methods: Vec<MethodInfo>,
pub attributes: Vec<AttributeInfo>,
}
pub struct FieldInfo {
pub access_flags: u16,
pub name_index: u16,
pub descriptor_index: u16,
pub attributes: Vec<AttributeInfo>,
}
pub struct MethodInfo {
pub access_flags: u16,
pub name_index: u16,
pub descriptor_index: u16,
pub attributes: Vec<AttributeInfo>,
}
```
## Constant Pool
Trait-based architecture with two layers:
### ConstantPoolGet Trait
Low-level accessors:
- `get_constant()`: Resolve by index (accounts for 64-bit entries)
- Type-specific getters: `get_i32()`, `get_utf8_info()`, `get_class_info()`, `get_method_ref()`, etc.
- Implemented via `pool_get_impl!` macro
### ConstantPoolExt Trait
High-level operations:
- `get_string()`: Fetch UTF-8 strings with CESU-8 decoding
- `resolve_class_name()`: Trace class references through constant pool
- `resolve_method_ref()` / `resolve_interface_method_ref()`: Resolve method references
- `resolve_field()`: Resolve field references with type descriptors
- `parse_attribute()`: Convert raw attribute bytes to typed Attribute enum
### Constant Pool Entry Types (20 types)
```rust
pub enum ConstantPoolEntry {
Utf8(ConstantUtf8Info),
Integer(i32), Float(f32), Long(i64), Double(f64),
Class(ConstantClassInfo),
String(ConstantStringInfo),
FieldRef(ConstantFieldrefInfo),
MethodRef(ConstantMethodrefInfo),
InterfaceMethodRef(ConstantInterfaceMethodrefInfo),
NameAndType(ConstantNameAndTypeInfo),
MethodHandle(ConstantMethodHandleInfo),
MethodType(ConstantMethodTypeInfo),
Dynamic(ConstantDynamicInfo),
InvokeDynamic(ConstantInvokeDynamicInfo),
Module(ConstantModuleInfo),
Package(ConstantPackageInfo),
}
```
## Attributes
Recursive attribute parsing with support for:
- **Code**: Method bytecode with max_stack, max_locals, exception tables, nested attributes
- **LineNumberTable**: Maps bytecode offsets to source line numbers
- **LocalVariableTable**: Local variable debugging info (name, descriptor, PC range)
- **BootstrapMethods**: Dynamic invocation bootstrap method references
- **StackMapTable**, **Exceptions**, **InnerClasses**: Parsed as raw byte vectors
- **SourceFile**, **Signature**: Index-based attribute data
- **Unknown**: Fallback for unrecognized attributes
### Code Attribute Structure
```rust
pub struct CodeAttribute {
pub max_stack: u16,
pub max_locals: u16,
pub code_length: u32,
pub code: Vec<u8>,
pub exception_table: Vec<ExceptionTableEntry>,
pub attributes: Vec<AttributeInfo>, // Recursive
}
```
## Access Flags
Bitfield structures for parsing JVM access flags:
- **ClassFlags**: PUBLIC, FINAL, INTERFACE, ABSTRACT, SYNTHETIC, ANNOTATION, ENUM, MODULE
- **FieldFlags**: PUBLIC, PRIVATE, PROTECTED, STATIC, FINAL, VOLATILE, TRANSIENT, SYNTHETIC, ENUM
- **MethodFlags**: PUBLIC, PRIVATE, PROTECTED, STATIC, FINAL, SYNCHRONIZED, BRIDGE, VARARGS, NATIVE, ABSTRACT, STRICT, SYNTHETIC
## Validation
Validation occurs at multiple levels:
1. **Binary Format** (Automatic via Deku):
- Magic number (0xCAFEBABE)
- Big-endian byte order
- Type-safe parsing with error propagation
2. **Constant Pool**:
- Index bounds checking in `get_constant()`
- Type validation: Each accessor checks the entry type matches expected type
- CESU-8 decoding errors caught from Java-style UTF-8 strings
3. **Class Structure** (ClassLoader):
- Debug assertions for Object class having super_class = 0
- Non-Object classes must have valid super class reference
- Interfaces must inherit from Object
## Descriptor Parsing
```rust
// Method descriptor: (II)I -> two ints, return int
MethodDescriptor::parse("(II)I")?
// Field descriptor: Ljava/lang/String; -> String class type
FieldType::parse("Ljava/lang/String;")?
// Array descriptor: [[I -> 2D int array
FieldType::parse("[[I")?
```
## Error Handling
- `DekuError`: Binary parsing failures
- `ConstantPoolError`: Pool access with Generic, DescriptorParseError, Cesu8DecodingError variants
- `VmError`: Higher-level VM-specific errors
- `DescParseError`: Invalid method/field descriptor syntax

134
docs/class-loading.md Normal file
View File

@ -0,0 +1,134 @@
# Class Loading and Boot Image
**Location**: `crates/core/src/class_loader.rs`, `crates/core/src/bimage.rs`
## Boot Image (Bimage)
A 7z archive containing precompiled Java standard library classes.
### Structure
```rust
pub struct Bimage {
image: ArchiveReader<File>, // 7z archive reader
modules: Vec<String>, // Available modules
packages: HashMap<String, String>, // Package -> Module mapping
pub total_access_time: Duration, // Performance tracking
}
```
- **Default Location**: `./lib/modules`
- **Format**: 7z compressed archive
- **Structure**: `<module>/classes/<class>.class`
- **Default Module**: `java.base` (used when no module is specified)
## Class Loading Flow
The `ClassLoader` manages class resolution with a two-tier fallback mechanism:
### Process
1. **Check Cache**: Look in `DashMap<(String, LoaderId), Arc<RuntimeClass>>` for already-loaded classes
2. **Try Bimage**: Attempt to load from boot image via `bimage.get_class(module, class_fqn)`
3. **Fallback to Disk**: If not in bimage, load from filesystem at `{CLASSPATH}/{class_name}.class`
4. **Parse & Cache**: Parse ClassFile using deku, create RuntimeClass, store in cache
### Key Method
```rust
pub fn load_class(&mut self, what: &str, loader: LoaderId) -> Result<Arc<RuntimeClass>, VmError> {
let bytes = self.bimage
.and_then(|b| b.get_class("", what).ok())
.or_else(|_| Self::load_class_from_disk(what))
.map_err(|_| VmError::LoaderError(...))?;
let (_, cf) = ClassFile::from_bytes(bytes.as_ref())?;
let runtime = self.runtime_class(cf);
// Store with loader ID for multi-loader support
self.classes.insert((class_fqn, loader), Arc::new(runtime));
}
```
## Classpath Handling
### Resolution Priority
1. **Bimage (boot image)** - Primary source for standard library
2. **Command-line argument (arg[1])** - User-provided classpath
3. **Default `./data` directory** - Fallback location
### Implementation
```rust
fn load_class_from_disk(what: &str) -> Result<Vec<u8>, String> {
let class_path = std::env::args()
.nth(1)
.unwrap_or("./data".to_string())
.replace("\\", "/");
let path = format!("{class_path}/{what}.class");
// Load file from disk
}
```
## Bootstrap Process
**Location**: `crates/core/src/vm.rs` - `boot_strap()` method
### Steps
1. **Create VM**: `ClassLoader::with_bimage()` - initializes with boot image
2. **Load Core Classes**: Preloads essential VM classes
3. **Create Primitive Classes**: Synthetic class objects for primitive types
4. **Initialize Classes**: Run static initializers (`<clinit>`)
### Core Classes Loaded
```rust
let classes = vec![
"java/lang/String",
"java/lang/System",
"java/lang/Class",
"java/lang/Object",
"java/lang/Thread",
"java/lang/ThreadGroup",
"java/lang/Module",
"java/lang/reflect/Method",
// ...
];
```
## RuntimeClass
Runtime representation of a loaded class:
### Cached Data
- **Superclass Chain**: `super_classes: Vec<Arc<RuntimeClass>>`
- **Interface Hierarchy**: `super_interfaces: Vec<Arc<RuntimeClass>>`
- **Component Type**: For array classes, reference to element type
- **Initialization State**: Thread-safe `InitState` enum
### Initialization States
```rust
pub enum InitState {
NotInitialized,
Initializing(ThreadId), // Track which thread is initializing
Initialized,
}
```
### Method/Field Resolution
- `find_method()` - Searches class then walks up superclass chain
- `find_field()` - Same recursive behavior for fields
- `is_assignable_into()` - Checks type compatibility with array covariance
## Multi-Loader Support
Classes are keyed by `(class_name, LoaderId)` tuple to support:
- Different class loaders loading same-named classes
- Isolation between class loader namespaces
- Proper class identity checks

141
docs/frame-interpreter.md Normal file
View File

@ -0,0 +1,141 @@
# Frame and Bytecode Interpreter
**Location**: `crates/core/src/frame/`
## Frame Structure
Each method invocation creates a `Frame` containing:
| Component | Description |
|-----------|-------------|
| Program Counter (PC) | i64 tracking current bytecode instruction |
| Operand Stack | Generic Vec-backed stack for intermediate values |
| Local Variables | Indexed slots, handles wide values (long/double occupy 2 slots) |
| Constant Pool | Arc reference to the class constant pool |
| Bytecode | Instructions for the method |
### OperandStack (`operand_stack.rs`)
- Generic Vec-backed stack with push/pop/peek operations
- `pop_n(n)` returns values in push order (not pop order) for method arguments
- Supports underflow detection
### LocalVariables (`local_vars.rs`)
- Vec-backed, indexed by slot
- Handles wide values (long, double) that occupy 2 slots with padding
- `from_args()` automatically spaces wide values correctly
- Prevents access to padding slots with runtime panic
## Execution Loop
```rust
loop {
let (offset, op) = self.next().unwrap();
self.pc = offset as i64;
let result = self.execute_instruction(op.clone());
match result {
Ok(ExecutionResult::Advance(offset)) => self.pc += offset as i64,
Ok(_) => self.pc += 1,
Err(x) => return error with stack trace,
}
}
```
## Opcode Dispatch
**Location**: `frame.rs` `execute_instruction()` (lines 199-1516)
- Single match statement over `Ops` enum variants (defined in `instructions.rs`)
- 200+ opcodes: constants, loads/stores, math, stack ops, branches, references, method invocation
- Uses helper macros (`load!`, `store!`, `binary_op!`, `shift_op!`, etc.) for common patterns
- Each opcode returns one of:
- `ExecutionResult::Continue` (auto-increment PC by 1)
- `ExecutionResult::Advance(offset)` (jump)
- `ExecutionResult::Return(())` or `ExecutionResult::ReturnValue(Value)` (exit frame)
- `VmError` on failure
## Opcode Encoding (`instructions.rs`)
- Uses `deku` derive for binary deserialization
- Each opcode has a u8 ID (0x00-0xFF)
- Some opcodes carry operands (e.g., `iload(u8)`, `goto(i16)`)
- Wide instruction prefix (0xC4) for accessing local slots > 255
## Method Invocation
**Location**: `thread.rs` (lines 308-338)
### Invocation Types
1. **`invoke()`**: Resolve method by class and descriptor, execute it
2. **`invoke_virtual()`**: Virtual dispatch - find method on actual runtime class
3. **`invoke_native()`**: Call native JNI method via FFI
### Bytecode Invocation Instructions
- **`invokevirtual`**: Virtual method dispatch - pop receiver + arguments, get actual class, call `thread.invoke_virtual()`
- **`invokespecial`**: Non-virtual (constructors, private, super) - pop receiver + arguments, call `thread.invoke()` with static resolution
- **`invokestatic`**: Static methods - pop arguments only (no receiver), call `thread.invoke()`
- **`invokeinterface`**: Interface method dispatch - similar to `invokevirtual` with interface resolution
### Frame Creation & Execution
```rust
fn execute_method(&self, class: &Arc<RuntimeClass>, method: &MethodData, args: Vec<Value>) {
let mut frame = Frame::new(
class.clone(),
method_ref,
code_attr, // Contains max_stack, max_locals, bytecode
args, // Initialize local vars with parameters
...
);
self.frame_stack.lock().push(frame.clone());
frame.execute() // Bytecode interpretation loop
self.frame_stack.lock().pop();
}
```
## Supported Instructions
### Constants
`aconst_null`, `iconst_*`, `lconst_*`, `fconst_*`, `dconst_*`, `bipush`, `sipush`, `ldc`, `ldc_w`, `ldc2_w`
### Load/Store
`iload`, `lload`, `fload`, `dload`, `aload`, `istore`, `lstore`, `fstore`, `dstore`, `astore` (including `_0-3` variants)
### Array Operations
`iaload`, `laload`, `faload`, `daload`, `aaload`, `baload`, `caload`, `saload`, `iastore`, `lastore`, `fastore`, `dastore`, `aastore`, `bastore`, `castore`, `sastore`, `arraylength`
### Stack Manipulation
`pop`, `pop2`, `dup`, `dup_x1`, `dup_x2`, `dup2`, `dup2_x1`, `dup2_x2`, `swap`
### Arithmetic
All int/long/float/double add, sub, mul, div, rem, neg operations
### Bitwise
`ishl`, `lshl`, `ishr`, `lshr`, `iushr`, `lushr`, `iand`, `land`, `ior`, `lor`, `ixor`, `lxor`
### Type Conversions
All primitive type conversions (`i2l`, `l2i`, `f2d`, etc.)
### Comparisons
`lcmp`, `fcmpl`, `fcmpg`, `dcmpl`, `dcmpg`
### Control Flow
`ifeq`, `ifne`, `iflt`, `ifge`, `ifgt`, `ifle`, `if_icmp*`, `if_acmp*`, `goto`, `ifnull`, `ifnonnull`, `tableswitch`, `lookupswitch`
### Object Operations
`new`, `newarray`, `anewarray`, `multianewarray`, `checkcast`, `instanceof`
### Field Access
`getstatic`, `putstatic`, `getfield`, `putfield`
### Method Invocation
`invokevirtual`, `invokespecial`, `invokestatic`, `invokeinterface`
### Returns
`ireturn`, `lreturn`, `freturn`, `dreturn`, `areturn`, `return`
### Synchronization
`monitorenter`, `monitorexit`

155
docs/jni.md Normal file
View File

@ -0,0 +1,155 @@
# JNI Implementation
**Location**: `crates/core/src/native/jni.rs`
## JNIEnv Structure
The JNIEnv is created as a function table (`JNINativeInterface_` from the `jni` crate) with 250+ function pointers:
```rust
pub fn create_jni_function_table(thread: *const VmThread) -> JNIEnv {
Box::into_raw(Box::new(JNINativeInterface_ {
reserved0: thread as *mut _, // Stores pointer to VmThread for context
reserved1: std::ptr::null_mut(),
reserved2: std::ptr::null_mut(),
reserved3: std::ptr::null_mut(),
GetVersion: Some(jni_get_version),
FindClass: Some(find_class),
RegisterNatives: Some(register_natives),
GetMethodID: Some(get_method_id),
// ... 240+ more function pointers
}))
}
```
**Key Feature:** The `reserved0` field stores a pointer to the `VmThread`, allowing each JNI function to access thread
context via `get_thread(env)`.
## VmThread's JNIEnv Storage
**Location**: `crates/core/src/thread.rs`
Each thread owns its JNIEnv:
```rust
pub struct VmThread {
pub id: ThreadId,
pub vm: Arc<Vm>,
pub loader: Arc<Mutex<ClassLoader>>,
pub jni_env: JNIEnv, // Stored per-thread
// ... other fields
}
// Created during VmThread initialization:
let jni_env = create_jni_function_table(weak_self.as_ptr() as * mut VmThread);
```
## Native Method Invocation
**Location**: `crates/core/src/thread.rs` (lines 340-428)
The flow is:
1. **Method Detection:** When a method has `ACC_NATIVE` flag, `invoke_native()` is called
2. **Symbol Resolution:** Generates JNI symbol name (e.g., `Java_java_lang_String_intern`)
3. **Lookup:** Searches registered native methods or loaded native libraries via `find_native_method()`
4. **FFI Call:** Uses `libffi` to call the native function with constructed arguments
```rust
pub fn invoke_native(&self, method: &MethodRef, args: Vec<Value>) -> MethodCallResult {
let symbol_name = generate_jni_method_name(method, false);
// Find the function pointer
let p = self.vm.find_native_method(&symbol_name)
.ok_or(VmError::NativeError(...))?;
// Build Call Interface (CIF) for FFI
let cp = CodePtr::from_ptr(p);
let built_args = build_args(args, &mut storage, &self.jni_env as *mut JNIEnv);
let cif = method.build_cif();
// Invoke with type-specific call
match &method.desc.return_type {
None => {
cif.call::<()>(cp, built_args.as_ref());
Ok(None)
}
Some(FieldType::Base(BaseType::Int)) => {
let v = cif.call::<jint>(cp, built_args.as_ref());
Ok(Some(v.into()))
}
// ... handle other return types
}
}
```
## Argument Marshalling
**Location**: `crates/core/src/thread.rs` (lines 509-548)
Native functions receive:
1. `JNIEnv*` - pointer to the function table
2. `jclass` or `jobject` (receiver) - always an ID (u32)
3. Parameters - converted from VM `Value` types to JNI types
```rust
fn build_args(mut params: VecDeque<Value>, storage: &mut Vec<Box<dyn Any>>,
jnienv: *mut JNIEnv) -> Vec<Arg> {
storage.push(Box::new(jnienv)); // Slot 0: JNIEnv*
let receiver_id = params.pop_front().map(...);
storage.push(Box::new(receiver_id as jobject)); // Slot 1: this/class
for value in params {
match value {
Value::Primitive(Primitive::Int(x)) => storage.push(Box::new(x)),
Value::Reference(Some(ref_kind)) => {
storage.push(Box::new(ref_kind.id() as jobject)) // References as IDs
}
// ... other types
}
}
storage.iter().map(|boxed| arg(&**boxed)).collect()
}
```
## Native Method Registration
**Location**: `crates/core/src/native/jni.rs` (lines 381-442)
Java code calls `RegisterNatives()` JNI function, which stores pointers:
```rust
unsafe extern "system" fn register_natives(
env: *mut JNIEnv,
clazz: jclass,
methods: *const JNINativeMethod,
n_methods: jint,
) -> jint {
let thread = &*get_thread(env);
for i in 0..n_methods as usize {
let native_method = &*methods.add(i);
let full_name = generate_jni_short_name(&class_name, name);
thread.vm.native_methods.insert(full_name, native_method.fnPtr);
}
JNI_OK
}
```
## Implemented JNI Functions
| Category | Functions |
|-------------------|--------------------------------------------------------------|
| Version | `GetVersion` |
| Class Operations | `FindClass`, `GetSuperclass`, `IsAssignableFrom` |
| Exceptions | `Throw`, `ThrowNew`, `ExceptionOccurred`, `ExceptionClear` |
| References | `NewGlobalRef`, `DeleteGlobalRef`, `NewLocalRef` |
| Object Operations | `AllocObject`, `NewObject`, `GetObjectClass`, `IsInstanceOf` |
| Field Access | `GetFieldID`, `Get/Set<Type>Field`, `GetStaticFieldID` |
| Method Invocation | `GetMethodID`, `Call<Type>Method`, `CallStatic<Type>Method` |
| String Operations | `NewString`, `GetStringLength`, `GetStringChars` |
| Array Operations | `NewArray`, `GetArrayLength`, `Get/Set<Type>ArrayRegion` |
| Registration | `RegisterNatives`, `UnregisterNatives` |
| Monitors | `MonitorEnter`, `MonitorExit` |

125
docs/native-ffi.md Normal file
View File

@ -0,0 +1,125 @@
# Native Methods and FFI System
**Location**: `crates/core/src/thread.rs`, `crates/core/src/native/`
## Library Loading
Native libraries are loaded dynamically using `libloading`:
- **Location**: `crates/core/src/main.rs`
- **Supported platforms**: Windows (.dll), Linux (.so)
- **Libraries loaded**:
- `roast_vm` - VM-specific native methods
- `jvm` - Java virtual machine standard library
- `java` - Java standard library
Libraries are registered with the VM via `Vm::load_native_library()` and stored in `native_libraries: Arc<RwLock<Vec<(String, Library)>>>`.
## Native Method Registration
Native methods are registered through JNI's `RegisterNatives` function:
- **Location**: `crates/core/src/native/jni.rs`
- **Process**:
1. Java code calls `RegisterNatives()` with method names, signatures, and function pointers
2. VM generates JNI-formatted symbol names using `generate_jni_method_name()`
3. Function pointers are stored in `Vm::native_methods: DashMap<String, *const c_void>`
4. Filters prevent registration of certain methods (e.g., Thread operations)
## Native Method Dispatch
When a native method is invoked:
- **Detection**: `MethodData::ACC_NATIVE` flag identifies native methods
- **Location**: `crates/core/src/thread.rs`
- **Flow**:
1. `execute_method()` checks if method has `ACC_NATIVE` flag
2. If static, adds class reference to args; otherwise adds instance reference
3. Calls `invoke_native()` with method reference and arguments
## FFI System - libffi Integration
libffi is used to call native functions with correct calling conventions:
### Call Interface Building
```rust
fn build_cif(&self) -> Cif {
let mut args = vec![
Type::pointer(), // JNIEnv*
Type::pointer(), // jclass/jobject
];
for v in self.desc.parameters {
args.push(v.into())
}
let return_type = ...;
Builder::new().args(args).res(return_type).into_cif()
}
```
- Constructs a Call Interface (Cif) from method signature
- Maps Java types to FFI types
- Always adds JNIEnv* and jclass/jobject as first two parameters
### Argument Marshalling
```rust
fn build_args(params, storage, jnienv) -> Vec<Arg>
```
- Marshals Java values to native format
- Converts references to object IDs (u32)
- Boxes primitives for FFI passing
- Stores all values in a temporary vector
### Function Invocation
```rust
let cp = CodePtr::from_ptr(p);
cif.call::<ReturnType>(cp, built_args.as_ref());
```
- Converts function pointer to CodePtr
- Calls through libffi with correct return type handling
## Type Mapping
| Java Type | FFI Type |
|-----------|----------|
| byte | i8 |
| char | u16 |
| short | i16 |
| int | i32 |
| long | i64 |
| float | f32 |
| double | f64 |
| boolean | i8 |
| Object/Array | pointer |
## JNI Function Table
A complete JNI function table is created and passed to native code:
- **Location**: `crates/core/src/native/jni.rs`
- **Implementation**:
- 250+ JNI functions defined as unsafe extern "system" functions
- Covers class operations, method invocation, field access, array operations, string handling
- Many functions are stubs returning `todo!()` for unimplemented features
## Unsafe Support (sun.misc.Unsafe)
Low-level unsafe operations are tracked:
- **Location**: `crates/core/src/native/unsafe.rs`
- **Features**:
- Field offset registry mapping to class/field pairs
- Off-heap memory allocation tracking
- Base offset constant: `0x1_0000_0000`
## Key Implementation Characteristics
- **Thread-safe**: All structures use `DashMap`, `RwLock`, or `Mutex`
- **JNI environment**: Created per-thread, stored in `VmThread::jni_env`
- **Symbol lookup**: Two-pass search (without params, then with params)
- **Error handling**: Returns `VmError::NativeError` if symbols not found
- **Tracking**: Maintains statistics on library resolution counts

95
docs/object-management.md Normal file
View File

@ -0,0 +1,95 @@
# Object Management
**Location**: `crates/core/src/objects/`
## Object Representation
Java objects are represented through a multi-layered abstraction:
```rust
pub struct Object {
pub id: u32, // Unique identifier
pub class: Arc<RuntimeClass>, // Runtime class reference
pub fields: DashMap<String, Value>, // Concurrent field storage
}
pub type ObjectReference = Arc<Mutex<Object>>;
```
- **Objects**: Contain a unique ID (u32), runtime class reference, and field storage (DashMap)
- **ObjectReference**: `Arc<Mutex<Object>>` - reference-counted, thread-safe smart pointer
- **Value wrapper**: The `Value` enum encapsulates both primitives and references for operand stack/local variable storage
## Array Management
Arrays are type-safe with separate variants for primitives and objects:
```rust
pub enum ArrayReference {
Int(Arc<Mutex<Array<jint>>>),
Byte(Arc<Mutex<Array<jbyte>>>),
Short(Arc<Mutex<Array<jshort>>>),
Long(Arc<Mutex<Array<jlong>>>),
Float(Arc<Mutex<Array<jfloat>>>),
Double(Arc<Mutex<Array<jdouble>>>),
Char(Arc<Mutex<Array<jchar>>>),
Boolean(Arc<Mutex<Array<jboolean>>>),
Object(Arc<Mutex<Array<Option<ReferenceKind>>>>),
}
```
- **Primitive arrays**: Int, Byte, Short, Long, Float, Double, Char, Boolean
- **Object arrays**: Can hold references to other objects
- **Array structure**: Each array wraps a boxed slice `Box<[T]>` with id, class, and backing storage
- **Thread-safe**: All arrays use `Arc<Mutex<Array<T>>>` for concurrent access
## Allocation Strategy
Allocation is centralized in **ObjectManager**:
- **Object allocation**: `new_object()` generates unique IDs via atomic counter and stores references in HashMap
- **Array allocation**: Separate methods for primitive arrays (`new_primitive_array()`) and object arrays (`new_object_array()`)
- **String interning**: `new_string()` creates UTF-16 encoded strings with automatic interning via string pool
- **Memory tracking**: `bytes_in_use()` calculates total heap usage across all objects/arrays
All allocated objects are registered in `objects: HashMap<u32, ReferenceKind>` for global access.
## Reference Handling
Two-level reference system:
1. **ReferenceKind enum**: Distinguishes between `ObjectReference` and `ArrayReference`
2. **Reference type alias**: `Option<ReferenceKind>` (None = null)
3. **Conversion methods**: Safe conversions with `try_into_object_reference()` and `try_into_array_reference()`
## Memory Management
**No explicit garbage collection** - relies on Rust's reference counting:
- Arc ensures objects live as long as references exist
- Mutex provides thread-safe field/element access
- Shallow cloning for `clone()` operations (copies references, not objects)
- Array copy operations handle both primitive and object types with bounds checking
## Object Synchronization
Monitor-based concurrency for synchronized operations:
```rust
pub struct Monitor {
owner: Option<ThreadId>,
entry_count: u32,
condition: Condvar,
mutex: Mutex<()>,
}
```
- **Operations**: `monitor_enter()`, `monitor_exit()`, `wait()`, `notify_one()`, `notify_all()`
- **Wait semantics**: Full support for Java-style wait/notify with timeout
- **Reentrant**: Same thread can enter multiple times (tracked by entry_count)
## Special Features
- **String handling**: UTF-16 LE encoding with automatic String object creation
- **Reflection support**: Methods to create Constructor, Method, MethodHandle objects
- **Class mirrors**: Every class has an associated mirror object (java/lang/Class)

87
docs/roast-vm-sys.md Normal file
View File

@ -0,0 +1,87 @@
# roast-vm-sys Crate
**Location**: `crates/roast-vm-sys/`
A cdylib crate that exports native method implementations callable from Java via JNI.
## Overview
**roast-vm-sys** is a JNI wrapper crate that exposes the roast-vm-core runtime to Java. It's compiled as a C dynamic library (cdylib) named `roast_vm`.
## Exported Native Methods
The crate exports 40+ JNI native functions via `#[no_mangle] extern "system"` declarations.
### By Module
| Module | Functions |
|--------|-----------|
| `thread.rs` | `Thread.currentThread()`, `Thread.start0()`, `Thread.setPriority0()` |
| `object.rs` | `Object.hashCode()`, `Object.clone()`, `Object.notify()`, `Object.notifyAll()`, `Object.wait()` |
| `class.rs` | `Class.forName0()`, `Class.getPrimitiveClass()`, `Class.getDeclaredConstructors0()` |
| `reflection.rs` | `Reflection.getCallerClass()` |
| `reflect/array.rs` | `Array.newArray()` |
| `string.rs` | `String.intern()` |
| `system.rs` | `System.arraycopy()`, `System.nanoTime()` |
| `runtime.rs` | `Runtime.maxMemory()`, `Runtime.availableProcessors()` |
| `misc_unsafe.rs` | `Unsafe` field offsets, volatile read/write, memory allocation |
| `file_output_stream.rs` | `FileOutputStream.writeBytes()` |
| `system_props.rs` | `vmProperties()` - VM identity (version 0.1.0, vendor "infernap12") |
| `CDS.rs` | Class Data Sharing stubs |
| `signal.rs` | `Signal.handle0()` stub |
| `scoped_memory_access.rs` | `ScopedMemoryAccess` registration |
## Bridge Pattern
Each native function follows this pattern:
1. **Extract VmThread** from `JNIEnv.reserved0` using `get_thread()` helper
2. **Resolve References** - Convert JNI handles (jobject) to internal references:
- `resolve_object()` - gets ObjectReference
- `resolve_array()` - gets ArrayReference
- `resolve_reference()` - gets generic ReferenceKind
3. **Perform Operation** via core VM APIs
4. **Return Result** in JNI-compatible format
## Example Native Implementation
```rust
#[unsafe(no_mangle)]
pub extern "system" fn Java_org_example_MockIO_print(
env: JNIEnv,
_jclass: JClass,
input: JString,
) {
unsafe {
let input: String = env.get_string_unchecked(&input)
.expect("Couldn't get java string!")
.into();
std::io::stdout().write_all(input.as_bytes()).ok();
}
}
```
## File Structure
17 modules organized by Java class:
- `lib.rs` - Core helpers, test functions (`MockIO.print()`, `Main.getTime()`)
- `runtime.rs`, `thread.rs`, `class.rs` - Core VM operations
- `object.rs`, `string.rs`, `reflection.rs`, `reflect/` - Object/class introspection
- `system.rs`, `file_output_stream.rs` - System I/O
- `misc_unsafe.rs` - Unsafe memory operations (largest implementation, ~626 lines)
- `CDS.rs`, `signal.rs`, `system_props.rs`, `scoped_memory_access.rs` - Stubs/properties
## GC Interaction
Native methods access the garbage collector via:
- `thread.gc.read()` / `thread.gc.write()` for object access
- Object creation, cloning, and array operations go through GC
- Field access via `thread.gc` or direct field references
## Error Handling
Mixed approach:
- Some methods panic on errors
- Some return null/default values
- TODO comments indicate incomplete exception throwing