compiler – Assembly equivalent of defining a new type?


class MyClass
int a;
int b;
void MyMethod();

I've always wondered what the high-level compiler parser does when it sees such a construction? Writes a description of a type with addresses of methods and sizes / types of variables into some table inside the file with the program? What happens when you create a class object on the stack?

MyClass Object;

The sizes and types of variables are extracted from this table, the corresponding addresses are assigned to them, the address of the constructor is placed on the call stack, and then initialization occurs, depending on what is written in the constructor? Where can you read more about this process?


In general, everything is very compiler-dependent. Some clever compilers might even throw out the class itself if not needed. But there are still general principles.

Let's start

When the compiler sees the definition of a class, it just parses it. But when you need to create an instance of a class, this is where the fun begins. The compiler, looking at the class description, calculates how much memory is needed for it. In the class described in the question – at least 8 bytes (here and further I will speak in the context of 32-bit x86 platforms). In fact, it can stand out more – for example, if the class is virtual, then 4 more bytes for the pointer to the table of virtual functions. That is, in general terms, new for a class is simply alloc + memset (and if the constructor is not trivial, then a call to the constructor).

For classes without virtual methods, memory is usually allocated as a structure with corresponding fields. For virtual ones, there can be at least one more field.

But what are "methods", are they class functions?

these are the most common functions, they just have one more (although no one forbids the compiler to use more, but usually this is one), which in fact is a pointer to the previously allocated memory. In the method, this parameter looks like this.

How the fields are accessed:

For each field, the compiler calculates an offset relative to this. For example, we have:

MyClass m;
m.a = 10;
m.b = 20;     

In pseudocode it is like this

mov [this+0], 10
mov [this+4], 20

offset +4, because the size of int is 4. But the compiler can do the alignment and, in fact, the second field may appear at offset 8.

Calling methods:

And the same as calling ordinary functions. Only, as I wrote above, we add one more parameter – a pointer to the class. The compiler knows the address of the method.

Calling virtual methods:

It's more interesting with them. For this, a method table is used (it seems that they have not come up with a better one yet. You can read more about it here .) The compiler takes its index in the table by the name of the function. And when you need to make a call in the code, it will be like this

mov eax, [this+8] ; адрес таблицы методов. 
mov eax, [eax + номер_метода]; загрузили адрес
push параметр
push this
call[eax] ; вызываем функцию по адресу

But the compiler can cheat. If he can determine which method to call, then he can insert the call directly. Moreover, the compiler may not even insert the this parameter if it is not used inside the method.

I think it's understandable that virtual method tables are created one per class, not per object.

Creating objects on the stack

but here is nothing special. In the classic implementation, "allocate memory on the stack" will simply change the pointer to the top of the stack. Since the stack grows from top to bottom, this is a subtraction of the size from the register holding the top of the stack. C even has such a function – alloca (in visual studio it can be called _alloca), which works like malloc, but allocates on the stack.

abstract methods

These methods are in the virtual method table, but they point to a special function that displays a message stating that such methods cannot be called.

all strange

The resulting code usually no longer contains any method or field names. There are only addresses and offsets. And there are no types either. But if the debugger needs to show the data to the user, then he receives a special map file from the compiler, where all this is described. That is why, if you debug the release code, the debugger often cannot even bind the code to the binary code – it simply does not have this information. And it is very difficult to guess.

But sometimes compilers, especially if they are making debug code, can add additional fields to check that the code is not doing anything terrible. For example, add the real type of the object and compare it if necessary.

And it happens that a programmer wants to use rtti, here you really need to add some kind of data.

Scroll to Top