jvm-rs/docs/class-file-parsing.md

4.9 KiB

Class File Parsing

Location: crates/core/src/class_file/

The class file parser uses the deku library for declarative binary deserialization with automatic validation.

Components

File Purpose
class_file.rs Main ClassFile struct with version, constant pool, fields, methods, attributes
constant_pool.rs ConstantPoolGet/ConstantPoolExt traits for pool access and resolution
attributes.rs Attribute parsing (Code, LineNumberTable, LocalVariableTable, BootstrapMethods)
mod.rs Access flag definitions (ClassFlags, MethodFlags, FieldFlags)

Key Types

pub struct ClassFile {
    pub minor_version: u16,
    pub major_version: u16,
    pub constant_pool: Arc<ConstantPoolOwned>,
    pub access_flags: u16,
    pub this_class: u16,
    pub super_class: u16,
    pub interfaces: Vec<u16>,
    pub fields: Vec<FieldInfo>,
    pub methods: Vec<MethodInfo>,
    pub attributes: Vec<AttributeInfo>,
}

pub struct FieldInfo {
    pub access_flags: u16,
    pub name_index: u16,
    pub descriptor_index: u16,
    pub attributes: Vec<AttributeInfo>,
}

pub struct MethodInfo {
    pub access_flags: u16,
    pub name_index: u16,
    pub descriptor_index: u16,
    pub attributes: Vec<AttributeInfo>,
}

Constant Pool

Trait-based architecture with two layers:

ConstantPoolGet Trait

Low-level accessors:

  • get_constant(): Resolve by index (accounts for 64-bit entries)
  • Type-specific getters: get_i32(), get_utf8_info(), get_class_info(), get_method_ref(), etc.
  • Implemented via pool_get_impl! macro

ConstantPoolExt Trait

High-level operations:

  • get_string(): Fetch UTF-8 strings with CESU-8 decoding
  • resolve_class_name(): Trace class references through constant pool
  • resolve_method_ref() / resolve_interface_method_ref(): Resolve method references
  • resolve_field(): Resolve field references with type descriptors
  • parse_attribute(): Convert raw attribute bytes to typed Attribute enum

Constant Pool Entry Types (20 types)

pub enum ConstantPoolEntry {
    Utf8(ConstantUtf8Info),
    Integer(i32), Float(f32), Long(i64), Double(f64),
    Class(ConstantClassInfo),
    String(ConstantStringInfo),
    FieldRef(ConstantFieldrefInfo),
    MethodRef(ConstantMethodrefInfo),
    InterfaceMethodRef(ConstantInterfaceMethodrefInfo),
    NameAndType(ConstantNameAndTypeInfo),
    MethodHandle(ConstantMethodHandleInfo),
    MethodType(ConstantMethodTypeInfo),
    Dynamic(ConstantDynamicInfo),
    InvokeDynamic(ConstantInvokeDynamicInfo),
    Module(ConstantModuleInfo),
    Package(ConstantPackageInfo),
}

Attributes

Recursive attribute parsing with support for:

  • Code: Method bytecode with max_stack, max_locals, exception tables, nested attributes
  • LineNumberTable: Maps bytecode offsets to source line numbers
  • LocalVariableTable: Local variable debugging info (name, descriptor, PC range)
  • BootstrapMethods: Dynamic invocation bootstrap method references
  • StackMapTable, Exceptions, InnerClasses: Parsed as raw byte vectors
  • SourceFile, Signature: Index-based attribute data
  • Unknown: Fallback for unrecognized attributes

Code Attribute Structure

pub struct CodeAttribute {
    pub max_stack: u16,
    pub max_locals: u16,
    pub code_length: u32,
    pub code: Vec<u8>,
    pub exception_table: Vec<ExceptionTableEntry>,
    pub attributes: Vec<AttributeInfo>,  // Recursive
}

Access Flags

Bitfield structures for parsing JVM access flags:

  • ClassFlags: PUBLIC, FINAL, INTERFACE, ABSTRACT, SYNTHETIC, ANNOTATION, ENUM, MODULE
  • FieldFlags: PUBLIC, PRIVATE, PROTECTED, STATIC, FINAL, VOLATILE, TRANSIENT, SYNTHETIC, ENUM
  • MethodFlags: PUBLIC, PRIVATE, PROTECTED, STATIC, FINAL, SYNCHRONIZED, BRIDGE, VARARGS, NATIVE, ABSTRACT, STRICT, SYNTHETIC

Validation

Validation occurs at multiple levels:

  1. Binary Format (Automatic via Deku):

    • Magic number (0xCAFEBABE)
    • Big-endian byte order
    • Type-safe parsing with error propagation
  2. Constant Pool:

    • Index bounds checking in get_constant()
    • Type validation: Each accessor checks the entry type matches expected type
    • CESU-8 decoding errors caught from Java-style UTF-8 strings
  3. Class Structure (ClassLoader):

    • Debug assertions for Object class having super_class = 0
    • Non-Object classes must have valid super class reference
    • Interfaces must inherit from Object

Descriptor Parsing

// Method descriptor: (II)I  ->  two ints, return int
MethodDescriptor::parse("(II)I")?

// Field descriptor: Ljava/lang/String;  ->  String class type
FieldType::parse("Ljava/lang/String;")?

// Array descriptor: [[I  ->  2D int array
FieldType::parse("[[I")?

Error Handling

  • DekuError: Binary parsing failures
  • ConstantPoolError: Pool access with Generic, DescriptorParseError, Cesu8DecodingError variants
  • VmError: Higher-level VM-specific errors
  • DescParseError: Invalid method/field descriptor syntax