Question:
I have the following program:
#include <stdio.h>
typedef struct
{
int a;
int b;
}Letter;
int main()
{
Letter arr[2] = { {1, 101}, {2, 23} };
Letter* p1 = arr;
Letter** p2 = &p1;
printf("%d\n", p2[0]->a);//--> muestra el dato 1
printf("%d\n", p2[0]->b);//--> muestra el dato 101
printf("%d\n", p2[1]->a); //--> aquí acaba el programa
printf("%d\n", p2[1]->b);
return 0;
}
The problem lies when this line of code is executed:
printf("%d\n", p2[1]->a);
The program stops working. I activated the debugger and it tells me that a segmentation fault occurred, however I don't understand the reason for the error.
A curious detail is that when these two codes are executed:
printf("%d\n", p2[0]->a);
printf("%d\n", p2[0]->b);
A segmentation fault does not occur, so it displays the data correctly.
Now, the million dollar question: Why in the first case does a segmentation fault not occur and in the second it does?
Answer:
p2
is a double pointer and in order to access the data in the array we need to use an additional access operator(*), otherwise undefined behavior would occur, this means that the compiler is free to do what it wants with the code, causing possible strange behavior at runtime.
For example, the compiler seeing these codes:
p2[0]->a
p2[0]->b
p2[1]->a
p2[1]->b
You could implicitly use this pointer arithmetic:
/* Son dos operadores de acceso(*) porque queremos acceder al dato de forma indirecta. */
*(*( p2 + sizeof(Letter) * indice) + offset_miembro)
And this means that the memory address of the data is not calculated properly.
Where:
-
sizeof(Letter)
: Will return the size of bytes (in this case it returns8
bytes) that theLetter
structure occupies. -
indice
: The position of X structure that we want to access. -
offset_miembro
: Each member of a structure has an associated offset , in which it will be used to reach the memory address of X member of theLetter
structure.
I emphasize that the offset
is not calculated by us, but by the compiler.
How do we get it?
One option is to use the offsetof macro.
Example:
//Este programa se lo compiló en una máquina de 32 bits (en Windows).
#include <stdio.h>
#include <stddef.h>
typedef struct
{
int a;
int b;
}Letter;
int main()
{
printf("Offset miembro a: %d\n", offsetof(Letter, a));
printf("Offset miembro b: %d\n", offsetof(Letter, b));
return 0;
}
Screen result:
Offset miembro a: 0
Offset miembro b: 4
Now yes, let's start with the deduction.
The compiler when it sees this expression:
p2[0]->a
Will convert it to:
//*( *( p2 + sizeof(Letter) * indice) + offset_miembro)
*( *( p2 + 8 * 0 ) + 0 )
Resulting in:
**p2
The code above does three things:
1.- Access the content of p2
, which is precisely the memory address of p1
.
2.- Then the content of the pointer p1
is accessed, which is precisely the base address of the first structure of the array .
3.- Finally, we access the content of that address.
So with this we can understand that the first expression will never give a segmentation fault because this subexpression always results in 0
(of course if the index is 0
):
(sizeof(Letter) * indice)
The same logic applies with this second expression:
p2[0]->b
The compiler converts it to:
//*( *( p2 + sizeof(Letter) * indice) + offset_miembro)
*( *( p2 + 8 * 0 ) + 4 )
Resulting in:
*( *p2 + 4)
*p2
will give the base address of the first structure of the array , then a 4
is added to it to reach the memory address of member b
and finally, the data is accessed.
With all this explanation we can answer this question:
Why doesn't a segmentation fault occur in the first case?
The answer will depend on how the compiler translates the expression, however, if we continue with our deduction, it is because this subexpression:
sizeof(Letter) * indice)
It will always return a 0
. So with this at no time would we be accessing a memory address that does not belong to the program.
However, this third expression:
p2[1]->a
The compiler converts it to:
//*( *( p2 + sizeof(Letter) * indice) + offset_miembro)
*( *( p2 + 8 * 1 ) + 0 )
Resulting in:
*( *( p2 + 8 ) )
First we access the content of p2
, which is basically the memory address of p1
, then we add an 8
to it, and finally we access the content of that address.
There is the problem! This expression possibly gives a segmentation fault:
*( p2 + 8 )
However, there is the possibility that this expression:
p2 + 8
Calculate the memory address of a variable that is part of the program.
Imagine that in memory we have the following:
0x08 -> dirección de memoria de p1
0x16 -> dirección de memoria de var1 (imagina que internamente tiene guardado un valor entero: 12)
So when evaluating this expression:
*(*( p2 + 8 ) )
It gives us as a result:
*(*( 0x08 + 8 ) )
*(*( 0x16 ) )
--> *( 0x12))
As the address 0x16
if it belongs to the program, it is totally valid to access its content, resulting in: *(0x12)
, but therein lies the problem, that later we will be accessing the address 0x12
(which was actually the value it had saved the address 0x16
) and there if a segmentation fault would occur (in our example yes).
This is crazy! Never try this at home!
Answering the second question:
Why in the second case if a segmentation fault occurs?
Because this subexpression:
sizeof(Letter) * indice)
It is not returning a 0
and this is because its index is different from 0
.
What is the correct way to access?
Thus:
(*p2 + 1)->a
In this case we have added the access operator (*) that was missing from the beginning. Since in this way, we ensure that the compiler uses an appropriate addressing mode (it depends on this to be able to calculate the memory address of X member of the Letter
structure).
Usually the compiler should convert the above code to this:
//*( *p2 + sizeof(Letter) * indice + offset_miembro)
*( *p2 + 8 * 1 + 0)
Resulting in:
*( *p2 + 8)
Now yes, *p2
returns the base address of the first structure in the array (basically whatever p1
points to), then we add the 8
to it. With this we calculate the base address of the second structure of the array and finally, we access said address and in this way, we will obtain the data that the member a
has stored.
Recommendation:
Do not try to access the array of structures through a double pointer, it makes the syntax less readable. Instead, use a plain pointer.
Imagine that you have a function called llenarDatos
, where its only parameter is a double pointer.
Example:
int llenarDatos(Letter** let)
{
int len;
//Usamos el puntero simple para acceder al array
Letter* p;
printf("Ingrese una longitud:\n");
scanf("%d", &len);
*let = malloc(len * sizeof(Letter));
if(*let == NULL)
{
printf("Error al asignar memoria!");
return -1;
}
p = *let;
for(int i = 0; i != len; ++i)
{
printf("Ingrese valor de A:");
scanf("%d", &p[i].a);
printf("Ingrese valor de B:");
scanf("%d", &p[i].b);
}
return len;
}
Simple isn't it? In the main
function we can have a simple pointer which has the base address of the structure array , so we can use it anywhere in main
.
Discussion:
The deduction was based on the pointer arithmetic that is usually used to access the array of structures through a simple pointer:
//Esto:
pointer[index].myMember
//Es equivalente a:
*(pointer + sizeof(type_data_struct) * index + offset_member)
Where this expression:
pointer + sizeof(type_data_struct) * index
It should give the base address of X structure of the array and adding the offset_member
to it would give us the memory address of X member of a structure .
Taking into account the above, we can deduce that the pointer arithmetic to be able to access an array of structures with a double pointer would be:
//Esto:
(*pointer + index)->myMember
//Es equivalente a:
*( *pointer + sizeof(type_data_struct) * index + offset_member)
It is almost the same arithmetic that we have seen before, the difference is that we must make one more memory access (because pointer
is a double pointer).
Taking into account the two previous arithmetics, we can arrive at this:
//Donde "pointer" es un puntero doble.
//Esto:
pointer[index]->myMember
//Podría ser equivalente a:
*( *(pointer + sizeof(type_data_struct) * index) + offset_member)
Why? Because with this arithmetic we can verify that when we execute this code: p2[0]->b
, it will not give a segmentation fault because the addressing mode will always be equal to:
//Donde "Pointer" es un puntero doble.
*( *pointer + offset_member)
On the other hand, when the position ( or index ) is different from 0
, the arithmetic is preserved as such, for that reason, an address is calculated that is not.
Conclusion:
All this problem is related to the way the compiler translates the statements. Likewise, we don't need to worry about doing this by hand, since all of this is done by the compiler implicitly; however, it helps us to resolve doubts.
Fountain:
- Structs and Alignment
- Accessing a Structure Member, Page 22
- Accessing Array Elements, Page 32