Question:
Hello, I dug up the implementation of the strlen function in the open spaces:
/* Magic numbers for the algorithm */
#if LONG_BIT == 32
static const unsigned long mask01 = 0x01010101;
static const unsigned long mask80 = 0x80808080;
#elif LONG_BIT == 64
static const unsigned long mask01 = 0x0101010101010101;
static const unsigned long mask80 = 0x8080808080808080;
#else
#error Unsupported word size
#endif
#define LONGPTR_MASK (sizeof(long) - 1)
/*
* Helper macro to return string length if we caught the zero
* byte.
*/
#define testbyte(x) \
do { \
if (p[x] == '\0') \
return (p - str + x); \
} while (0)
size_t
strlen(const char *str)
{
const char *p;
const unsigned long *lp;
/* Skip the first few bytes until we have an aligned p */
for (p = str; (uintptr_t)p & LONGPTR_MASK; p++)
if (*p == '\0')
return (p - str);
/* Scan the rest of the string using word sized operation */
for (lp = (const unsigned long *)p; ; lp++)
if ((*lp - mask01) & mask80) {
p = (const char *)(lp);
testbyte(0);
testbyte(1);
testbyte(2);
testbyte(3);
#if (LONG_BIT >= 64)
testbyte(4);
testbyte(5);
testbyte(6);
testbyte(7);
#endif
}
/* NOTREACHED */
return (0);
}
can you comment on why such a trash?
Answer:
This is a strlen
implementation that does not work byte-wise, but word-by-word. That is why it was made: for the sake of greater efficiency to read and analyze data from memory not in single bytes, but in immediately aligned words of 32 or 64 bits in length.
To achieve alignment, the initial part of the line (up to the required alignment boundary) is processed byte by byte, that is, in the usual way.
Then the line is read word by word. Each word read is processed with the following bit hack
if ((*lp - mask01) & mask80)
which analyzes the entire read word (that is, in fact, "in parallel" analyzes all of its bytes) and approximately answers the question of whether there is a zero byte in the *lp
word. If this check gives a positive answer, then the code under this if
, using the macro testbyte
already finds out in what exact position it was.
The "approximate" check in this case is that the if
condition can sometimes give a positive answer even in the case of a word without a zero byte – if the word contains bytes with one in the most significant bit. Those. this check allows false positives and, as a result, causes an unnecessary byte-by-byte check via testbyte
, but this does not affect the correctness of the result. Lines composed of characters with codes no higher than 128 will be processed efficiently.
Words containing a zero byte will be deliberately detected by this check.
A more complex version of the condition based on the same masks
if ((*lp - mask01) & ~*lp & mask80)
would guarantee an accurate determination of the presence of a zero byte in a word, but the authors obviously decided that their approximate version would work more efficiently (most likely because the code is designed for strings from the characters of the Latin alphabet).
Theoretically, such an implementation has one formal problem: when reading the contents of a string word by word, it can happen that the last read word will "stick out" outside the limits of the string itself and the memory allocated for it. In practice, this does not lead to problems, since allocation and / or protection of memory is usually done along boundaries aligned at least to the size of a full word. However, such "crashes" can cause complaints from dynamic code analysis tools like valgrind.