4. How Java Works
Programming Project 2022/23

4.2. Java Architecture

Compilation and interpretation in Java

Java is one of these mixed languages, which combines compilation and interpretation. This means:

  1. We write Java code.
  2. We compile Java code to bytecode using the Java compiler.
  3. We run our bytecode using the Java Virtual Machine.
  4. The JVM interprets the bytecode and generates machine code to be executed by the operating system.
  5. In some cases, the JVM will also compile the bytecode to machine code using the Just-in-time compiler.

java workflow

Compile-time errors

When we feed a bunch of text to the compiler to convert it to machine code, we may find:

  • syntax errors
  • type-checking errors

If the compiler succeeds, what do we know?

  • That the program was well formed—a meaningful program in whatever language.
  • That it is possible to start running the program as it is in a machine readable form. This does not mean that the program is correct: the program might fail immediately, but at least we can try.

Run-time errors

We can encounter run-time errors when we feed our byte code to the JVM, which may or may not be handled by our application.

These may be caused by, for example,

  • division by zero,
  • running out of memory,
  • trying to open a file that isn’t there, or
  • trying to find a web page and discovering that the URL is not well formed.

Java bytecode

Java bytecode is the instruction set of the Java Virtual Machine.

If you think of the JVM as "a computer inside a computer", then bytecode is:

  • the "machine code for the CPU of the internal computer" or
  • the "machine code for an abstract processor."

Each bytecode is composed of:

  • 1 or 2 bytes that represent an instruction, and
  • 0 or more bytes that represent parameters.

Here is an example.

The main method below...

public static void main(String[] args) {
  for (int i = 1; i <= 10; i++)
    if (i % 2 == 0)
      System.out.println(i+ " is even.");
}

...is compiled into this in bytecode:

 0 iconst_5
 1 istore_1
 2 getstatic #7 <java/lang/System.out>
 5 iload_1
 6 invokevirtual #13 <java/io/PrintStream.println>
 9 new #19 <Main$Person>
12 dup
13 invokespecial #21 <Main$Person.<init>>
16 astore_2
17 getstatic #7 <java/lang/System.out>
20 aload_2
21 getfield #22 <Main$Person.age>
24 invokevirtual #13 <java/io/PrintStream.println>
27 return

javac

The javac command in Java compiles a program from a command prompt. It reads a Java source program from a text file and creates a compiled Java class file.

The basic form of the javac command is:

javac filename [options]

For example, to compile a program named HelloWorld.java, use this command:

javac HelloWorld.java

javac compiles the file that you specify on the command line. You can, however, use javac to compile more than one file at a time.

  1. If the Java file you specify on the command line contains a reference to another Java class that’s defined by a Java file in the same folder, the Java compiler automatically compiles that class, too.

  2. You can list more than one filename in the javac command.

    The following command compiles three files:

    javac TestProgram1.java TestProgram2.java TestProgram3.java
  3. You can use a wildcard to compile all the files in a folder, like this:

    javac *.java

Compiling your Java class

  1. Let us create a Java program that prints all even numbers from 0 to 10.

      public class Main {
        public static void main(String[] args) {
          for (int i = 1; i <= 10; i++)
            if (i % 2 == 0)
              System.out.println(i + " is even.");
        }
      }
  2. Then we open a terminal and compile Main.java using javac.

    javac Main.java

    This should generate our compiled file: Main.class

  3. To execute our Main.class, we run.

    java Main

Java Virtual Machine (JVM)

The Java Virtual Machine (JVM) is a software that provides an environment for running Java programs.

It interprets bytecode into machine code, which it passes to the operating system for execution.

The JVM provides one of the key features of Java, that is, platform independence.

Java is portable because the same Java program can be executed in multiple platforms without making any change in the source code.

You just need to write the Java code for one platform and the same program will run in any platform.

JVM and platform independence

Each platform has its own JVM.

All JVMs can:

  • interpret bytecode,
  • convert bytecode into machine code required for its own platform, and
  • interact with the operating system to run the program.

This makes Java programs platform independent and portable.

multiple jvms

What's inside the JVM?

The JVM is not a monolith, but a complex application composed of several components, including

  • classloader,
  • byte code verifier,
  • interpreter, and
  • just-in-time compiler.

jvm internals

To understand the role of these components, let us compile and run a slightly more complex Java program.

  1. Write a class Person that contains two instance variables, String name and int age.
    public class Person {
      String name;
      int age;
    
      public Person(String name, int age) {
        this.name = name;
        this.age = age;
      }
    
      @Override
      public String toString() {
        return "Person{" +
                "name='" + name + '\'' +
                ", age=" + age +
                '}';
      }
    }
  2. Write a class Runner that creates two instances of Person and prints their data to the console.
    public class Runner {
      public static void main(String[] args) {
          Person john = new Person("John", 31);
          Person jane = new Person("Jane", 1);
    
          System.out.println(john);
          System.out.println(jane);
      }
    }
  3. Compile both classes.
    javac Person.java Runner.java
  4. Execute the Runner class.
    java Runner
    Person{name='John', age=31}
    Person{name='Jane', age=1}
  5. Move your Person.class to a subfolder called subdir.
    mkdir subdir 
    mv Person.class subdir/Person.class
  6. Execute the Runner class again.
    java Runner
    Exception in thread "main" java.lang.NoClassDefFoundError: Person
          at Runner.main(Runner.java:3)
    Caused by: java.lang.ClassNotFoundException: Person
          at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:606)
          at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
          at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
    How could we get our program running again?

How does the JVM find the classes in our program?

To solve this previous error, let us backtrack a little bit.

The java command is called the Java launcher because we use it to launch Java applications.

When the Java launcher is invoked, it

  1. gathers input from the user and the user’s environment,
  2. interfaces with the JVM, and
  3. starts some bootstrapping.

The Java Virtual Machine (JVM) does the rest of the work.

Java Class Loader

Java Class Loaders provide the points of entry for code into the JVM.

They are responsible for loading all classes into the JVM's memory.

The JVM uses several class loaders.

  • The bootstrap class loader loads the core Java libraries located in the <JAVA_HOME>/jmods directory (or <JAVA_HOME>/jre/lib for Java 8 or below). This class loader, which is part of the core JVM, is written in native code.
  • The extensions class loader loads the code in the extensions directories (in Java 8 or below, <JAVA_HOME>/jre/lib/ext, or any other directory specified by the java.ext.dirs system property). (Deprecated.)
  • The system class loader loads code found on java.class.path, which maps to the CLASSPATH environment variable.

Classpath

Our classes are found by looking into directories identified on

  • the -classpath option of the command line or
  • the CLASSPATH environment variable.

The classpath is a list of directories, JAR files, and ZIP files that contain class files.

A class file has a subpath name that reflects the fully-qualified name of the class. For example, if the class com.mypackage.MyClass is stored under myclasses, then myclasses must be in the user classpath, and the full path to the class file must be

  • /myclasses/com/mypackage/MyClass.class on Linux/Mac and
  • \myclasses\com\mypackage\MyClass.class on Windows.

If the class is stored in an archive named myclasses.jar, then myclasses.jar must be in the user classpath, and the class file must be stored in the archive as

  • com/mypackage/MyClass.class on Windows and
  • com\mypackage\MyClass.class on Linux/Mac.

The user classpath is specified as a string, with

  • a colon (:) to separate the classpath entries on UNIX-based systems (Linux, Mac), or
  • a semicolon (;) to separate the entries on Windows systems.

The value of classpath is determined by the following.

  • The default value . that indicates the current directory.
  • The value of the CLASSPATH environment variable, which overrides the default value.
  • The value of the -cp or -classpath command-line option, which overrides the default and CLASSPATH values.
  • The JAR archive specified by the -jar option overrides all other values if it contains a Class-Path entry in its manifest. If this option is used, all user classes must come from the specified archive.

Solving our missing class problem...

Use the java -cp command-line option to run our previous example correctly.

java -cp subdir/:. Runner
Person{name='John', age=31}
Person{name='Jane', age=1}

Bytecode verifier

jvm internals

The bytecode verifier assesses the bytecode to see if it has any security issues.

It ensures that

  1. the code follows JVM specifications,
  2. there is no unauthorized access to memory,
  3. the code does not cause any stack overflow, and
  4. there are no illegal data conversions in the code such as float to object references.

Once this code is verified, the JVM converts the bytecode into machine code and the execution of the program can start.

Just-In-Time compiler

When a Java program is executed, its bytecode is interpreted by the JVM, which could be a slow process.

To overcome this issue, the JVM introduced the Just-In-Time compiler (JIT).

When a particular bytecode is executed for the first time, the JIT compiler compiles it into native machine code. Once the bytecode is recompiled by the JIT compiler it runs faster.

The compilation happens when the byte code is about to be executed, hence the name.

Once the bytecode is compiled into that particular machine code, it is cached by the JIT compiler to be reused, hence the main performance improvement by using JIT compiler can be seen when the same code is executed again and again.