PoC - A JVM in pure Rust

Rust is an interesting and exciting language featuring a great ecosystem. In search of a toy project to program in Rust, I decided to look into Java classfiles and the Java virtual machine (JVM).

In this post, I will not cover the basics of the Java System, such as the virtual machine, how Java bytecode is created, etc. I will purely focus on the step taken in the rust-jvm so far.

Goal

Initially, the goal was to parse Java classfiles in Rust. As this was very straightforward and fun, the local next step was to attempt to interpret the parsed classes.

So the new goal was to run a simple “Hello World” Java program like this:

1
2
3
4
5
6
7
8
package hello;

public class HelloWorld1 {

  public static void main(String[] args) {
    System.out.println("Hello World!");
  }
}

To keep things simple, the initial PoC should only support Java 5. Also, the aim of this project is not to create a serious JVM. Instead, the aim is to have fun, learn about the JVM and write Rust.

Classfile Parsing

This project started with classfile parsing. The Java® Virtual Machine Specification is great and a constant companion during the implementation. (This link if for the specification on Java 8, although this rust-jvm targets only Java 5. The specification for Java 5 is available anymore - but the Java 8 specs work equally well.)

All classfile parsing-related code is contained in folder clsasfile, see https://github.com/Pfarrer/rust-jvm/tree/c811289aa0531c611f290cdffc0f2392431fe160/src/classfile.

The main method load_file of classfile parsing can be found in the main mod.rs file. It will create a new Classfile object containing the parsed information about the classfile provided.

37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
pub fn load_file(filename: &String) -> Classfile {
    let file = File::open(filename).unwrap();
    let mut reader = BufReader::new(file);

    let version = version::read(&mut reader);
    let constants = constants::read(&mut reader);
    let class_info = class_info::read(&mut reader);
    let fields = fields::read(&mut reader, &constants);
    let methods = methods::read(&mut reader, &constants);
    let attributes = attributes::read(&mut reader, &constants);

    Classfile {
        version,
        constants,
        class_info,
        fields,
        methods,
        attributes,
    }
}

Interesting to note is that the first three values (version, constants and class_info) are independent. The following three values (fields, methods and attributes) depend on previously parsed constants.

My favorite part of the specification is this sidenote found in chapter 4, section 4.4.5.:

In retrospect, making 8-byte constants take two constant pool entries was a poor choice.

Virtual Machine

The virtual machine is the main part. It depends on the parsed Classfile objects from the previous step.

42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
impl Vm {
    pub fn new(class_paths: Vec<String>) -> Vm {
        let classloader = Classloader::new(class_paths);
        let class_hierarchy = ClassHierarchy::new();
        let frame_stack = Vec::new();
        let class_statics = HashMap::new();
        let string_pool = StringPool::new();
        let memory_pool = MemoryPool::new();

        let mut vm = Vm {
            initialized: false,
            classloader,
            class_hierarchy,
            frame_stack,
            class_statics,
            string_pool,
            memory_pool,
        };

        // Initialize VM by loading System class
        let system_class_path = "java/lang/System".to_string();
        vm.load_and_clinit_class(&system_class_path);
        vm.initialized = true;

        // Create root frame
        // Add args array
        let args = Array::new_complex(0, "java/lang/String".to_string());
        let rc_args = Rc::new(RefCell::new(args));

        let mut frame = Frame::new(0, "<root_frame>".to_string(), "<root_frame>".to_string(), "<root_frame>".to_string());
        frame.stack_push(Primitive::Arrayref(rc_args));
        vm.frame_stack.push(frame);
        
        // Invoke System.initializeSystemClass
        utils::invoke_method(&mut vm, &system_class_path, &"initializeSystemClass".to_string(), &"()V".to_string(), false);

        vm
    }

The vm-related code could use some refactoring. This has been postponed in favor of progress. Refactoring and modularization might be a task/goal for the future.

Initialization

Did you ever wonder how the static fields inside the fundamental Java runtime classes are initialized? - I did, and finally found the solution: For Java 5, the java.lang.System class has a private static method that can not be invoked from the Java world. Only the JVM can invoke it, and will do so after setting up the virtual machine to finally initialize the runtime. Here is the source code of that method for Java 7 (which is essentially equal to the method in Java 5).

This method is invoked as the last step of creating a new Vm object, see snipped in the previous section.

Result

This is an asciinemea recording of the rust-jvm being compiled and running the simple “Hello World” class shown in the Goal section. (using Rust 1.64 on an ARM-based Apple M1, recorded in October 2022):

As shown in the video, the standalone executable has a size of only 1.9 MB! To run the “Hello World” class, the Java SE 5 runtime is also required. All files add up to about 70 MB (unzipped).

The first commit was created on 17.10.2017, but work on this likely started some days or weeks earlier. The Proof-of-Concept “Hello World” milestone was achieved 16 months later.

The entire code is split into 131 files totaling 3926 lines of Rust code (excluding blank lines or comments). Here is a Github link to the repository at exactly the commit described in this post: https://github.com/Pfarrer/rust-jvm/tree/c811289aa0531c611f290cdffc0f2392431fe160

Outlook

Since this was an exciting experience so far, and I appreciate the learning I had, I am considering to continue working on this project. One of the next steps should include some refactoring and modularization of the existing code. Supporting Java 8 or even later versions might be an interesting task. So, stay tuned.