Question:
In the wake of the question about bitwise representation of real numbers and my answer to it .
I want to programmatically determine for any real type how many bits in it are allocated for the mantissa, and how many for the exponent. To do this, I wrote the following code (in it, the bit under the sign is counted separately from the mantissa, so the numbers are 1 less):
https://ideone.com/YuIWNc – C code (float, double, long double)
https://ideone.com/342B4S – C ++ code (float, double, long double)
https://ideone.com/VURQnw – C ++ code (float, double, long double, __float128)
#include <cstdio>
template <typename typed> void count(unsigned *result_m, unsigned *result_e)
{
typed x = 1, exp;
unsigned res, e;
for (res=0; x!=0; ++res) x/=2;
for (exp=1,e=0; exp*2<res; ++e) exp*=2;
*result_e = e+1;
*result_m = resexp+1;
}
int main(void)
{
unsigned f_m, f_e, d_m, d_e, ld_m, ld_e, f128_m, f128_e;
count<float>(&f_m, &f_e);
count<double>(&d_m, &d_e);
count<long double>(&ld_m, &ld_e);
count<__float128>(&f128_m, &f128_e);
printf(" S M E SZ\n");
printf("float: 1 %3u %2u %3u\n", f_m, f_e, 8 * sizeof(float));
printf("double: 1 %3u %2u %3u\n", d_m, d_e, 8 * sizeof(double));
printf("long double: 1 %3u %2u %3u\n", ld_m, ld_e, 8 * sizeof(long double));
printf("__float128: 1 %3u %2u %3u\n", f128_m, f128_e, 8 * sizeof(__float128));
}
It turns out like this:
S M E SZ
float: 1 23 8 32
double: 1 52 11 64
long double: 1 63 15 128
__float128: 1 112 15 128
For float
, double
and even __float128
everything works ( Wikipedia, IEEE 7542008 ).
But with long double
, problems arise:

1+63+15 = 79
– 79 bits. Instead of 80. Where is another bit? 
long double
represents 10 byte numbers, butsizeof
returned 16.
How can you get 10?
Answer:
One bit was lost due to the fact that on the x86 platform the 80bit floating value has one fundamental difference in representation from the 32 and 64bit IEEE754 floating values ( float
and double
).
float
and double
use the implicit leading unit representation in the mantissa. That is, in the normalized representation, the most significant unit in the mantissa is not stored explicitly, but only implied. But in the extended 80bit floating type long double
this leading unit in the mantissa is always stored explicitly .
Because of this, the difference arises.
For float
and double
your first loop will first iterate through the normalized representations of the number, in which the explicit mantissa is always zero and the exponent decreases from half its maximum value ( 127
for float
) to 1
:
// Для `float`
// Нормализованные представления: мантисса равна 0, а экспонента убывает от 127 до 1
0x3F800000
...
0x00800000 < после 126 делений
After that, your loop continues to iterate through the denormalized representations of the number, in which the exponent is 0
, and the lone unit moves to the right along the mantissa. When this lonely unit flies past the right edge of the mantissa, x
becomes zero and the cycle ends
// Денормализованные представления: экспонента равна 0, а мантисса состоит
// из движущейся вправо единицы
0x00400000
0x00200000
...
0x00000001
0x00000000 < после 150 делений
Note that in float
and double
unit in the mantissa occurs only in the very first denormalized value and goes through all the bits of the mantissa. It turns out that the number of denormalized nonzero values in this case is equal to the number of bits in the mantissa.
However, when using a long double
unit in the most significant bit of the mantissa was always clearly present, from the very beginning. When in your loop the exponent of a long double
reaches zero and the loop starts counting denormalized long double
values, the unit in the mantissa does not appear "out of nowhere" to the high position of the mantissa (as it did in float
and double
), but is already present in the high position from the beginning and it "starts" from there. Because of this, the part of the loop that counts denormalized values does one less iteration.
By the way, the strange way of adding in one sum – res
– half of the exponent range and the width of the mantissa is fraught with problems. You then calculate the value of log2 res
and expect that this value will correctly describe the number of bits in the exponent. However, if in some hypothetical floating type the mantissa turns out to be very wide, then the value of log2 res
may be erroneous.