Question:
I am writing a small jvm (not very strictly observing the specification) and a compiler for this business. Faced with such a question. According to the specification in the pool of constants in the class file, there is a constant of the Utf8 type, which, as I understand it, serves to represent strings. It is not entirely clear to me what it is for, since I have the following line of reasoning.
There are no strings in jvm, they can only be represented as int []. In order to prevent the end user from fiddling with arrays, a String class was made, which takes this int [] as input (at the java level, it is char []). And an expression like:
String str = "abc";
It is deployed by the compiler in this way:
char[3] t;
t[0] = 'a';
t[1] = 'b';
t[2] = 'c';
String str = new String(t);
Those strings also turn out to be a pool of constants, but already as constants of type int (since a char with a utf character expands into a short, and all short at the vm level is int, and any int> 127 goes into the constant pool (up to 127 is pushed onto the stack directly via bipush instruction)).
The question is – why do we need utf8 constants (I probably misunderstand the mechanics of strings?)?
PS At the same time, if you use entering strings into the constant pool, as I wrote, then it turns out just crazy saving space, since for a string, for example, 1000 Cyrillic characters, you will have to allocate 2kb of memory in the constant pool, and to store them as int, only 2 * 4 * number of identical characters in a string. That is, if, for example, only the letters of the Russian alphabet are used, the entire line will take only 66 bytes in the pool.
Answer:
Well, as usual, I asked the question and then I figured it out myself. The utf8 string constants are needed to store various meta information that the jvm needs to function. An example is the operation of the new instruction, which, as an argument, receives an address in the constant pool, which stores a string constant with the class signature. For example for:
new String ();
the compiler will generate a new instruction, the argument of which will refer to the cell of the pool of utf8 constants with the content java / lang / String.