c++ – Data alignment

Question:

I came across the term Data Alignment in the article " Optimizing C ++ (C ++ 11) Code Using a Specific Task Example ". In that article, the code is optimized by reordering the variables in the structure.

Why is the code being optimized only due to the change in the order of the variables?

Answer:

To be honest, just ignore it. This nano-optimization won't be needed for most of your life. Most of the problems are solved not by rearranging the fields of the structure, but by thoughtful rewriting of algorithms and the choice of suitable data structures. Although of course you need to know about it.


So look what the matter is.

Computer memory works as follows. The program memory can be thought of as a large array of bytes, and the address is an index in this array.

[байт 0] [байт 1] [байт 2] [байт 3] [байт 4] [байт 5] [байт 6] [байт 7]...

(Well, or sometimes it's easier to imagine like this:

... [байт 7] [байт 6] [байт 5] [байт 4] [байт 3] [байт 2] [байт 1] [байт 0]

– you have yet to learn the difference between Big Endian and Little Endian.)

A byte, as the minimum addressable unit of memory, can be addressed as you like. But usually you work with larger units. For example, this is a machine word : the size of a processor register. For a 32-bit architecture, this is a four-byte structure.

Now, bytes are nicely organized into fours:

[байт 0] [байт 1] [байт 2] [байт 3] [байт 4] [байт 5] [байт 6] [байт 7]...
[        машинное слово 0         ] [        машинное слово 1         ]...

But note that, in principle, bytes could be combined into machine words in another way:

[байт 0] [байт 1] [байт 2] [байт 3] [байт 4] [байт 5] [байт 6] [байт 7] [байт 8]...
         [          машинное слово         ] [     ещё одно машинное слово     ]...

– a machine word can start at any address!

So, most computers are designed so that machine words combined into fours in the upper image are read quickly, but machine words of the second type (that is, those that are not obtained in steps of 4 from zero) are read slowly. (Although the alignment requirements may be more complicated, and not coincide with the multiplicity of a machine word.) Worse, very many architectures do not allow reading such words at all, and in order to read such a word:

[байт 0] [байт 1] [байт 2] [байт 3] [байт 4] [байт 5] [байт 6] [байт 7]...
         [          машинное слово         ]

have to read machine words at addresses 0 and 4

[байт 0] [байт 1] [байт 2] [байт 3] [байт 4] [байт 5] [байт 6] [байт 7]...
         [    нужно это машинное слово     ]
[     но приходится читать это    ] [              и это              ]

select the necessary bytes from them, shuffle and assemble "manually" into the desired machine word! It is clear that this is a complex and relatively expensive operation.

Therefore, compilers of the C programming language (and most others too) carry out the following optimization: insignificant bytes are inserted between the fields of the structure so that all fields have good alignment.

Example: for a structure like this:

struct
{
    char x; // 1 байт
    int y;  // 4 байта
    char z; // 1 байт
};

memory is allocated like this:

[    ] [    ] [    ] [    ] [    ] [    ] [    ] [    ] [    ] [    ] [    ] [    ]
[  x ] { потерянные байты } [             y           ] [  z ] { потерянные байты }

If the structure itself is aligned in memory (the compiler also takes care of this), then y will be aligned, and access to it will be fast (and on some platforms, let me remind you, in general, only in this case is possible).

But in this case, our structure takes up more memory than if the order of the variables were like this:

[    ] [    ] [    ] [    ] [    ] [    ] [    ] [    ]
[  x ] [  z ] {  потеряно } [             y           ]

A larger structure means more memory and more cost to copy, read that structure, and the like. Thus, by rearranging the fields of the structure, we can save.


Note that there is actually no universal optimization. For example, for different computer architectures, the size of a machine word will be different. In addition, the sizes of any complex data can also differ, which means that the calculated optimal order on one system may turn out to be very suboptimal on another system.


Additional reading on the topic (in English): http://www.catb.org/esr/structure-packing/

Scroll to Top