java – Why do jvm need string constants?

Question:

I am writing a small jvm (not following the specification very strictly) and a compiler for this business. Faced such a question. According to the specification in the class file constant pool, there is a Utf8 type constant, which, as I understand it, serves to represent strings. It is not entirely clear to me what it is for, since I have the following chain of reasoning.

There are no strings in jvm, they can only be represented as int[]. In order for the end user not to mess with arrays, the String class was made, which takes just this int[] as input (at the java level it is char[]). And an expression like:

String str = "abc";

Expanded by the compiler like this:

char[3] t;
t[0] = 'a';
t[1] = 'b';
t[2] = 'c';
String str = new String(t);

That string also ends up in the constant pool, but already as constants of the int type (since a char with a utf character expands into a short, and all short at the vm level is an int, and any int > 127 goes into the constant pool (up to 127 is pushed onto the stack directly via the bipush instruction)).

Question – why are utf8 constants needed (probably I misunderstand the mechanics of how strings work?)?

PS At the same time, if you use adding strings to the constant pool as I wrote, then it turns out to be just crazy saving space, since for a string, for example, of 1000 Cyrillic characters, you will have to allocate 2kb of memory in the constant pool, and to store them as int, only 2 * 4 * number of identical characters in a string. That is, if, for example, only letters of the Russian alphabet are used, the entire string will take only 66 bytes in the pool.

Answer:

Well, as usual, I asked a question and then figured it out myself. The utf8 string constants are needed to store various meta information that the jvm needs to function. An example is the operation of the new instruction, which, as an argument, receives an address in the constant pool, which stores a string constant with a class signature. For example for:

new String();

the compiler will generate the new instruction, whose argument will refer to a cell of the utf8 type constant pool with the content – java/lang/String.

Scroll to Top