jvm-rs/docs/class-file-parsing.md

156 lines
4.9 KiB
Markdown

# Class File Parsing
**Location**: `crates/core/src/class_file/`
The class file parser uses the **deku** library for declarative binary deserialization with automatic validation.
## Components
| File | Purpose |
|------|---------|
| `class_file.rs` | Main ClassFile struct with version, constant pool, fields, methods, attributes |
| `constant_pool.rs` | ConstantPoolGet/ConstantPoolExt traits for pool access and resolution |
| `attributes.rs` | Attribute parsing (Code, LineNumberTable, LocalVariableTable, BootstrapMethods) |
| `mod.rs` | Access flag definitions (ClassFlags, MethodFlags, FieldFlags) |
## Key Types
```rust
pub struct ClassFile {
pub minor_version: u16,
pub major_version: u16,
pub constant_pool: Arc<ConstantPoolOwned>,
pub access_flags: u16,
pub this_class: u16,
pub super_class: u16,
pub interfaces: Vec<u16>,
pub fields: Vec<FieldInfo>,
pub methods: Vec<MethodInfo>,
pub attributes: Vec<AttributeInfo>,
}
pub struct FieldInfo {
pub access_flags: u16,
pub name_index: u16,
pub descriptor_index: u16,
pub attributes: Vec<AttributeInfo>,
}
pub struct MethodInfo {
pub access_flags: u16,
pub name_index: u16,
pub descriptor_index: u16,
pub attributes: Vec<AttributeInfo>,
}
```
## Constant Pool
Trait-based architecture with two layers:
### ConstantPoolGet Trait
Low-level accessors:
- `get_constant()`: Resolve by index (accounts for 64-bit entries)
- Type-specific getters: `get_i32()`, `get_utf8_info()`, `get_class_info()`, `get_method_ref()`, etc.
- Implemented via `pool_get_impl!` macro
### ConstantPoolExt Trait
High-level operations:
- `get_string()`: Fetch UTF-8 strings with CESU-8 decoding
- `resolve_class_name()`: Trace class references through constant pool
- `resolve_method_ref()` / `resolve_interface_method_ref()`: Resolve method references
- `resolve_field()`: Resolve field references with type descriptors
- `parse_attribute()`: Convert raw attribute bytes to typed Attribute enum
### Constant Pool Entry Types (20 types)
```rust
pub enum ConstantPoolEntry {
Utf8(ConstantUtf8Info),
Integer(i32), Float(f32), Long(i64), Double(f64),
Class(ConstantClassInfo),
String(ConstantStringInfo),
FieldRef(ConstantFieldrefInfo),
MethodRef(ConstantMethodrefInfo),
InterfaceMethodRef(ConstantInterfaceMethodrefInfo),
NameAndType(ConstantNameAndTypeInfo),
MethodHandle(ConstantMethodHandleInfo),
MethodType(ConstantMethodTypeInfo),
Dynamic(ConstantDynamicInfo),
InvokeDynamic(ConstantInvokeDynamicInfo),
Module(ConstantModuleInfo),
Package(ConstantPackageInfo),
}
```
## Attributes
Recursive attribute parsing with support for:
- **Code**: Method bytecode with max_stack, max_locals, exception tables, nested attributes
- **LineNumberTable**: Maps bytecode offsets to source line numbers
- **LocalVariableTable**: Local variable debugging info (name, descriptor, PC range)
- **BootstrapMethods**: Dynamic invocation bootstrap method references
- **StackMapTable**, **Exceptions**, **InnerClasses**: Parsed as raw byte vectors
- **SourceFile**, **Signature**: Index-based attribute data
- **Unknown**: Fallback for unrecognized attributes
### Code Attribute Structure
```rust
pub struct CodeAttribute {
pub max_stack: u16,
pub max_locals: u16,
pub code_length: u32,
pub code: Vec<u8>,
pub exception_table: Vec<ExceptionTableEntry>,
pub attributes: Vec<AttributeInfo>, // Recursive
}
```
## Access Flags
Bitfield structures for parsing JVM access flags:
- **ClassFlags**: PUBLIC, FINAL, INTERFACE, ABSTRACT, SYNTHETIC, ANNOTATION, ENUM, MODULE
- **FieldFlags**: PUBLIC, PRIVATE, PROTECTED, STATIC, FINAL, VOLATILE, TRANSIENT, SYNTHETIC, ENUM
- **MethodFlags**: PUBLIC, PRIVATE, PROTECTED, STATIC, FINAL, SYNCHRONIZED, BRIDGE, VARARGS, NATIVE, ABSTRACT, STRICT, SYNTHETIC
## Validation
Validation occurs at multiple levels:
1. **Binary Format** (Automatic via Deku):
- Magic number (0xCAFEBABE)
- Big-endian byte order
- Type-safe parsing with error propagation
2. **Constant Pool**:
- Index bounds checking in `get_constant()`
- Type validation: Each accessor checks the entry type matches expected type
- CESU-8 decoding errors caught from Java-style UTF-8 strings
3. **Class Structure** (ClassLoader):
- Debug assertions for Object class having super_class = 0
- Non-Object classes must have valid super class reference
- Interfaces must inherit from Object
## Descriptor Parsing
```rust
// Method descriptor: (II)I -> two ints, return int
MethodDescriptor::parse("(II)I")?
// Field descriptor: Ljava/lang/String; -> String class type
FieldType::parse("Ljava/lang/String;")?
// Array descriptor: [[I -> 2D int array
FieldType::parse("[[I")?
```
## Error Handling
- `DekuError`: Binary parsing failures
- `ConstantPoolError`: Pool access with Generic, DescriptorParseError, Cesu8DecodingError variants
- `VmError`: Higher-level VM-specific errors
- `DescParseError`: Invalid method/field descriptor syntax