Java Virtual Machine a 32bit machine implemented in software that runs the byte code files operations are performed primarily on the stack, so only few registers (3) but a simpler implementation virtual hardware - registers (storing a 32b address) - stack - garbage-collected heap (32bit word aligned) - method area (byte aligned) stack, heap, and method area are within the addressable memory (32b -> 4GB memory) datatypes small number of primitive datatypes integral types (signed) byte -- 8 bit short -- 16 bit int -- 32 bit long -- 64 bit floating point (signed) float -- 32 bit double -- 64 bit character char -- 16 bit reference types class type, array type, interface type -- 32 bit address to an object on the heap null -- not really a reference type, rather of type null (but can be converted to any reference type) only value is null only int, long, float, double, reference, and returnAddress are used internally actual type internal type cat ------------------------------------ boolean int (i) 1 byte int 1 char int 1 short int 1 int int 1 float float (f) 1 reference reference 1 returnAddress returnAddress 1 long long (l) 2 double double (d) 2 boolean datatype does not exist in VM, rather int is used returnAddress are pointers to opcodes all datatype checking needs to be done by the compiler, the VM assumes that all values are type checked first. bytes on the stack thus do not carry type information, rather there are different opcodes for different types. runtime data access some data areas are craeted at startup, destroyed at exit some are per-thread data areas program counter, pc (per-thread) ever thread has a pc register, pointing to the current execution if the mehtod being executed is not /native/. java VM stack (per-thread) stores frames stack is never operated on directly (except push/pop), also the stack need not be a contiguous section in memory. heap (global) where referenced objects reside method area (global) stores compiled byte code runtime constant pool (per-class / per-interface) representation of the ''constant_pool'' table in the .class file, serves as symbol table native method stack (per-thread) contain native methods, also called "C Stacks" used for methods written in other languages than Java registers - program counter - optop register - frame register - vars frames (per-thread) each method invocation has its own frame as long as the method executes. it consists of three sections 1, local variable array (vars register), fixed in size at compile time 2, reference to runtime constant pool (frame register) 3, operand stack holds operands and results (optop register points to top of this stack) ever thread has one method that is currently executed (current method) which has its own frame (current frame) and is part of exactly one class (current class). frame ceases to be current if it exits (returns or throws uncaught exception) or calls another method, causing a new frame being created which becomes the current frame. current operand stack is always the topmost stack section, the optop register therefore always points to the top of the entire java stack. local variable array any single local variable (inside local variable array) can represent /boolean/, /byte/, /char/, /short/, /int/, /float/, /reference/, or /returnAddress/. for /long/ and /double/ a pair of local variables is required. variables inside the local var array are indexed using integers ranging from 0 to length-1. long and double items are indexed using the lower index value only. first element is usually a reference to the instance of the class, the this pointer. operand stack is used to pass values to and from operators (instructions) the instructions pop from the stack what they require and push onto the results a single element on the stack may hold any value, including long and double --- Interpretion of byte code opcode (single byte) + optional operands. basically as follows: do { fetch opcode; if (operands) fetch operands; execute the action defined by opcode; } while(opcodes following); instructions load local var -> operand stack Tload, Tload_N store operand stack value -> local variable Tstore, Tstore_N load constant constant -> operand stack bipush, sipush, ldc, ldc_w, ldc2_w aconst_null, iconst_ml, Tconst_N large number of arithmetic instructions operater on integers, longs, floats, doubles widening type conversion int -> long, float, double long -> float, double float -> double narrowing type conversion int -> byte, short, char (i2b, i2s, i2c) long -> int (l2i) float -> int, long double -> int, long, float --- .class File Format sequence of 8 bit bytes. multi-byte entities are stored big-endian (most significant byte first) u1, u2, u4 -- unsigend 1, 2, 4 byte values the general structure is as shown here #Class File Structure ClassFile { u4 magic = 0xCAFEBABE; u2 minor_version; //class file version u2 major_version; //class file version u2 constant_pool_count; // number of cp_info elems (1 = empty pool) indexed by 1..N-1 cp_info constant_pool[constant_pool_count-1]; u2 access_flags; u2 this_class; //valid index to CONSTANT_Class_info elem in constant_pool u2 super_class; //like this_class or zero (but then this class must be the class Object) u2 interfaces_count; //number of super interfaces 0..N-1 u2 interfaces[interfaces_count]; u2 fields_count; //number of class/instance variables (not inherited ones) field_info fields[fields_count]; u2 methods_count; //number of methods (not inherited ones) method_info methods[methods_count]; u2 attributes_count; //number of attributes (SourceFile, Deprecated, ...) attribute_info attributes[attributes_count]; } access flags ACC_PUBLIC -- 0x0001, may be accessed from outside the package ACC_FINAL -- 0x0010, no subclasses allowed ACC_SUPER -- 0x0020, treat superclass methods specially when invoked with /invokespecial/ (should be set for new implementations) ACC_INTERFACE -- 0x0200, interface not a class ACC_ABSTRACT -- 0x0400, may not be instantiated ACC_INTERFACE requires ACC_ABSTRACT. ACC_PUBLIC may be added. all other flags can not be used with ACC_INTERFACE ACC_FINAL excludes ACC_ABSTRACT and vice versa (for obvious reasons) class and interface names fully qualified names in CONSTANT_Utf8_info structure ''java.lang.Object'' becomes ''java/lang/Object'' field descriptors describe the type of a field B -- byte C -- char D -- double F -- float I -- int J -- long Ljava/lang/Object; -- reference type (an instance of java.lang.Object) S -- short Z -- boolean [ -- reference (one array dimension, 255 is max allowed) ''[[[D'' thus is an array of arrays of array of doubles or: double var[][][]; method descriptors describe method signatures. all field descriptors may be used, additionally there is V V -- void return type the params (max 255) precede the return value and are enclosed in parentheses. Object method(int i, doulbe d, Thread t) has the method descriptor (IDLjava/lang/Thread;)Ljava/lang/Object; the implicit this reference to the class the method is belonging to (for non-static, that is instance methods) is /not/ reflected in the param list. The Constant Pool all instructions refer to values in the constant_pool the general structure of an entry is: #constant_pool entry cp_info { u1 tag; //giving the type of entry u1 info[]; //2+ bytes } these are the values for the tag byte CONSTANT_Utf8 1 CONSTANT_Integer 3 CONSTANT_Float 4 CONSTANT_Long 5 CONSTANT_Double 6 CONSTANT_Class 7 CONSTANT_String 8 CONSTANT_Fieldref 9 CONSTANT_Methodref 10 CONSTANT_InterfaceMethodref 11 CONSTANT_NameAndType 12 Class the structures in detail: CONSTANT_Class_info { u1 7; u2 name_index; // "Ljava/lang/Object" or "[[I" } Fieldref, Methodref, InterfaceMethodref CONSTANT_XXXref_info { //XXX = {Field|Method|InterfaceMethod} u1 tag; //9, 10, or 11 u2 class_index; //CONSTANT_Class_info in cp u2 name_and_type_index; //CONSTANT_NameAndType_info in cp } String CONSTANT_Class_info { u1 8; u2 string_index; //CONSTANT_Utf8_info in cp } Integer, Float CONSTANT_Integer_info { //or Float u1 tag; //3 (int), or 4 (float) u4 bytes; //big-endian bytes determining value } the float value is computed as follows: - 0x7f800000 is +INF - 0xff800000 is -INF - 0x7f800001..0x7fffffff is NaN - 0xff800001..0xffffffff is NaN - all other cases: int sign = ((bits >> 31) == 0) ? 1 : -1; int expo = ((bits >> 23) & 0xff); int mant = (expo == 0) ? (bits & 0x7fffff) << 1 : (bits & 0x7fffff) | 0x800000; float value = sign * mant * 2 ^ (expo - 150) Long, Double CONSTANT_Long_info { //or Double u1 tag; //5 (long), 6 (double) u4 high_bytes; u4 low_bytes; } the double values is computed as follows: - 0x7ff0000000000000L is +INF - 0xfff0000000000000L is -INF - 0x7ff0000000000001L..0x7fffffffffffffffL is NaN - 0xfff0000000000001L..0xffffffffffffffffL is NaN - all other cases: int sign = ((bits >> 63) == 0) ? 1 : -1; int expo = (int)((bits >> 52) & 0x7ffL); long mant = (expo == 0) ? (bits & 0xfffffffffffffL) << 1 : (bits & 0xfffffffffffffL) | 0x10000000000000L; double value = sign * mant * 2 ^ (e - 1075) NameAndType CONSTANT_NameAndType_info { u1 12; u2 name_index; u2 descriptor_index; } name_index -- CONSTANT_Utf8_info in cp (valid field/method name or ) descriptor_index -- CONSTANT_Utf8_info in cp (valid field/method descriptor) Utf8 used for constant string values. in UTF8 character sequences that contain only non-null characters are represented 1 B per character. all other characters up ", to 16 bit can be represented using 2B. 0x0001 .. 0x007F is represented using a single byte. 0ccccccc 0x0000 and characters in the range 0x0080 .. 0x07FF are represented by a pair of characters X and Y X: 110ccccc Y: 10dddddd the character value is then ((x & 0x1f) << 6) + (y & 0x3f), or cccccdddddd all other characters (0x0800 .. 0xFFFF) are represented using 3 bytes X: 1110cccc Y: 10dddddd Z: 10eeeeee the character value is then ((x & 0xf) << 12) + ((y & 0x3f) << 6) + (z & 0x3f), or ccccddddddeeeeee in the class file the bytes are stored in big-endian order (X, Y, Z) that 0 is encoded using 2 bytes is an exeption to the standard, as well that no longer coding techniques are used. CONSTANT_Utf8_info { u1 1; u2 bytes_length; //number of bytes (not null-terminated) u1 bytes[length]; } Fields each field in a class is defined by a field_info structure field_info { u2 access_flags; //access permission u2 name_index; //CONSTANT_Utf8_info (name) in cp u2 descriptor_index; //CONSTANT_Utf8_info (field descriptor) in cp u2 attributes_count; attribute_info attributes[attributes_count]; } access flags ACC_PUBLIC -- 0x0001, accessed from outside package ACC_PRIVATE -- 0x0002, only within defining class ACC_PROTECTED -- 0x0004, within subclasses ACC_STATIC -- 0x0008, static (class variable) ACC_FINAL -- 0x0010, no further assignment after initialization ACC_VOLATILE -- 0x0040, cannot be cached ACC_TRANSIENT -- 0x0080, not written/read by persistent object manager ACC_FINAL excludes ACC_VOLATILE and vice versa interface fields must be public, static, final and nothing else Methods each method (including (...) and static (void)) is described by a method_info structure method_info { u2 access_flags; //access permission u2 name_index; //CONSTANT_Utf8_info (name, , ) in cp u2 descriptor_index; //CONSTANT_Utf8_info (method descriptor) in cp u2 attributes_count; attribute_info attributes[attributes_count]; } access flags ACC_PUBLIC -- 0x0001, accessed from outside package ACC_PRIVATE -- 0x0002, only within defining class ACC_PROTECTED -- 0x0004, within subclasses ACC_STATIC -- 0x0008, static (class variable) ACC_FINAL -- 0x0010, no further assignment after initialization ACC_SYNCHRONIZED -- 0x0020, invocation is wrapped in monitor lock ACC_NATIVE -- 0x0100, implemented in non-java ACC_ABSTRACT -- 0x0400, no implementation ACC_STRICT -- 0x0800, floating-point mode is strict FP Attributes used to signal special attributes of classes, methods, and fields. attribute_info { u2 attribute_name_index; //CONSTANT_Utf8_info (name) in cp u4 attribute_length; //length in bytes u1 info[attribute_length]; //array of attribute bytes } Code, ConstantValue, Exceptions are implemented by every VM ConstantValue Attribute to init fields that are static, if given for a non-static field, the attribute must be ignored ConstantValue_attribute { u2 attribute_name_index; //CONSTANT_Utf8_info ("ConstantValue") in cp u4 attribute_length; //2 u2 constantvalue_index; //CONSTANT_XXX (the value) in cp } constantvalue_index can point to any of the following CONSTANT_Long -- long CONSTANT_Float -- float CONSTANT_Double -- double CONSTANT_Integer -- int, short, char, byte, boolean CONSTANT_String -- String Code Attribute required for all methods with ACC_NATIVE or ACC_ABSTRACT not set, used to store the byte code instructions Code_attribute { u2 attribute_name_index; //CONSTANT_Utf8_info ("Code") in cp u4 attribute_length; //length without initial 6 bytes u2 max_stack; //max depth of stack at any point u2 max_locals; //number of locals and params u4 code_length; //>0 bytes in code array u1 code[code_length]; //byte code instructions u2 exception_table_length; //number of entries in exception table { u2 start_pc; //starting program counter (valid index in code) u2 end_pc; //first pc value where execption is inactive (>start_pc, max code_length) u2 handler_pc; //index in code, specifying the entry point for exception handler u2 catch_type; //zero or CONSTANT_Class_info in cp giving exception type } exception_table[exception_table_length]; u2 attributes_count; //attributes of code attribute_info attributes[attributes_count]; } Exceptions Attribute used to denote what exceptions a method may throw. 0-1 exceptions attributes may be added to methods. Exceptions_attribute { u2 attribute_name_index; //CONSTANT_Utf8_info ("Exceptions") in cp u4 attribute_length; //length without initial 6 bytes u2 number_of_exceptions; u2 exception_index_table[number_of_exceptions]; } exception_index_table must containt CONSTANT_Class_info structures specifying classes derived from Throwable Other Attributes InnerClass -- required for inner classes Synthetic -- class members not appearing in source code SourceFile -- optional, giving the source file name LineNumberTable -- optional, for debugging LocalVariableTable -- optional, for debugging Deprecated -- optional, for superseded items Verification of .class files (Bytecode Verifier) is performed before execution to ensure integrity. it is ensured that - there are no operand stack overflows/underflows - all local variable uses and stores are valid - the arguments to all JVM instructions are of valid types the verifier ensures that all static and structural constraints are met. verifier operates in four passes: - basic format checks: magic number, valid lengths (during loading) - all non-code checks: final not subclassed, non-java.lang.Object classes have superclasses, cp indices and elements are well-formed (during linking) - code checks: data-flow analysis on each method for data types involved in any operation and number of args for method invocations (during linking) - code checks only required when code is actually run (for efficiency): type matching, access restrictions, (during execution) Constraints checked by Verifier static constraints . structural constraints .