156 lines
4.9 KiB
Markdown
156 lines
4.9 KiB
Markdown
# Class File Parsing
|
|
|
|
**Location**: `crates/core/src/class_file/`
|
|
|
|
The class file parser uses the **deku** library for declarative binary deserialization with automatic validation.
|
|
|
|
## Components
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `class_file.rs` | Main ClassFile struct with version, constant pool, fields, methods, attributes |
|
|
| `constant_pool.rs` | ConstantPoolGet/ConstantPoolExt traits for pool access and resolution |
|
|
| `attributes.rs` | Attribute parsing (Code, LineNumberTable, LocalVariableTable, BootstrapMethods) |
|
|
| `mod.rs` | Access flag definitions (ClassFlags, MethodFlags, FieldFlags) |
|
|
|
|
## Key Types
|
|
|
|
```rust
|
|
pub struct ClassFile {
|
|
pub minor_version: u16,
|
|
pub major_version: u16,
|
|
pub constant_pool: Arc<ConstantPoolOwned>,
|
|
pub access_flags: u16,
|
|
pub this_class: u16,
|
|
pub super_class: u16,
|
|
pub interfaces: Vec<u16>,
|
|
pub fields: Vec<FieldInfo>,
|
|
pub methods: Vec<MethodInfo>,
|
|
pub attributes: Vec<AttributeInfo>,
|
|
}
|
|
|
|
pub struct FieldInfo {
|
|
pub access_flags: u16,
|
|
pub name_index: u16,
|
|
pub descriptor_index: u16,
|
|
pub attributes: Vec<AttributeInfo>,
|
|
}
|
|
|
|
pub struct MethodInfo {
|
|
pub access_flags: u16,
|
|
pub name_index: u16,
|
|
pub descriptor_index: u16,
|
|
pub attributes: Vec<AttributeInfo>,
|
|
}
|
|
```
|
|
|
|
## Constant Pool
|
|
|
|
Trait-based architecture with two layers:
|
|
|
|
### ConstantPoolGet Trait
|
|
Low-level accessors:
|
|
- `get_constant()`: Resolve by index (accounts for 64-bit entries)
|
|
- Type-specific getters: `get_i32()`, `get_utf8_info()`, `get_class_info()`, `get_method_ref()`, etc.
|
|
- Implemented via `pool_get_impl!` macro
|
|
|
|
### ConstantPoolExt Trait
|
|
High-level operations:
|
|
- `get_string()`: Fetch UTF-8 strings with CESU-8 decoding
|
|
- `resolve_class_name()`: Trace class references through constant pool
|
|
- `resolve_method_ref()` / `resolve_interface_method_ref()`: Resolve method references
|
|
- `resolve_field()`: Resolve field references with type descriptors
|
|
- `parse_attribute()`: Convert raw attribute bytes to typed Attribute enum
|
|
|
|
### Constant Pool Entry Types (20 types)
|
|
|
|
```rust
|
|
pub enum ConstantPoolEntry {
|
|
Utf8(ConstantUtf8Info),
|
|
Integer(i32), Float(f32), Long(i64), Double(f64),
|
|
Class(ConstantClassInfo),
|
|
String(ConstantStringInfo),
|
|
FieldRef(ConstantFieldrefInfo),
|
|
MethodRef(ConstantMethodrefInfo),
|
|
InterfaceMethodRef(ConstantInterfaceMethodrefInfo),
|
|
NameAndType(ConstantNameAndTypeInfo),
|
|
MethodHandle(ConstantMethodHandleInfo),
|
|
MethodType(ConstantMethodTypeInfo),
|
|
Dynamic(ConstantDynamicInfo),
|
|
InvokeDynamic(ConstantInvokeDynamicInfo),
|
|
Module(ConstantModuleInfo),
|
|
Package(ConstantPackageInfo),
|
|
}
|
|
```
|
|
|
|
## Attributes
|
|
|
|
Recursive attribute parsing with support for:
|
|
|
|
- **Code**: Method bytecode with max_stack, max_locals, exception tables, nested attributes
|
|
- **LineNumberTable**: Maps bytecode offsets to source line numbers
|
|
- **LocalVariableTable**: Local variable debugging info (name, descriptor, PC range)
|
|
- **BootstrapMethods**: Dynamic invocation bootstrap method references
|
|
- **StackMapTable**, **Exceptions**, **InnerClasses**: Parsed as raw byte vectors
|
|
- **SourceFile**, **Signature**: Index-based attribute data
|
|
- **Unknown**: Fallback for unrecognized attributes
|
|
|
|
### Code Attribute Structure
|
|
|
|
```rust
|
|
pub struct CodeAttribute {
|
|
pub max_stack: u16,
|
|
pub max_locals: u16,
|
|
pub code_length: u32,
|
|
pub code: Vec<u8>,
|
|
pub exception_table: Vec<ExceptionTableEntry>,
|
|
pub attributes: Vec<AttributeInfo>, // Recursive
|
|
}
|
|
```
|
|
|
|
## Access Flags
|
|
|
|
Bitfield structures for parsing JVM access flags:
|
|
|
|
- **ClassFlags**: PUBLIC, FINAL, INTERFACE, ABSTRACT, SYNTHETIC, ANNOTATION, ENUM, MODULE
|
|
- **FieldFlags**: PUBLIC, PRIVATE, PROTECTED, STATIC, FINAL, VOLATILE, TRANSIENT, SYNTHETIC, ENUM
|
|
- **MethodFlags**: PUBLIC, PRIVATE, PROTECTED, STATIC, FINAL, SYNCHRONIZED, BRIDGE, VARARGS, NATIVE, ABSTRACT, STRICT, SYNTHETIC
|
|
|
|
## Validation
|
|
|
|
Validation occurs at multiple levels:
|
|
|
|
1. **Binary Format** (Automatic via Deku):
|
|
- Magic number (0xCAFEBABE)
|
|
- Big-endian byte order
|
|
- Type-safe parsing with error propagation
|
|
|
|
2. **Constant Pool**:
|
|
- Index bounds checking in `get_constant()`
|
|
- Type validation: Each accessor checks the entry type matches expected type
|
|
- CESU-8 decoding errors caught from Java-style UTF-8 strings
|
|
|
|
3. **Class Structure** (ClassLoader):
|
|
- Debug assertions for Object class having super_class = 0
|
|
- Non-Object classes must have valid super class reference
|
|
- Interfaces must inherit from Object
|
|
|
|
## Descriptor Parsing
|
|
|
|
```rust
|
|
// Method descriptor: (II)I -> two ints, return int
|
|
MethodDescriptor::parse("(II)I")?
|
|
|
|
// Field descriptor: Ljava/lang/String; -> String class type
|
|
FieldType::parse("Ljava/lang/String;")?
|
|
|
|
// Array descriptor: [[I -> 2D int array
|
|
FieldType::parse("[[I")?
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
- `DekuError`: Binary parsing failures
|
|
- `ConstantPoolError`: Pool access with Generic, DescriptorParseError, Cesu8DecodingError variants
|
|
- `VmError`: Higher-level VM-specific errors
|
|
- `DescParseError`: Invalid method/field descriptor syntax |