Question:
Let a variable of type double
be cast to a variable of type float
. If a double
variable stores a NaN
value, then it is converted to NaN
. If the double
variable stores the value Inf
, then it is converted to Inf
. If the double
variable stores the value -1e+50
, then it is converted to -Inf
.
Is this behavior guaranteed by the C++ standard, or is it IEEE 754 ? Or is it implementation-defined behavior , or is it generally undefined behavior and generally speaking, it is not worth counting on the fact that -1e+50
is converted to -Inf
?
Answer:
Strictly speaking, the standard does not define the dimensions of float
and double
, but in my answer I will assume that they are of different dimensions and represent 32 and 64 bit data types, respectively.
Because float
can hold numbers from ±1.18×10^(−38) to ±3.4×10^38 , and you try to store a double
variable whose value cannot be float
float
you get -inf
. You get undefined behavior. Therefore, you should not rely on -inf
.
My original answer, stating that you get undefined behavior, is incorrect. Undefined behavior is obtained if we try to put a number in a float
that cannot be represented in it. Here is how it is described in the standard:
C++14 standard [conv.double]
A prvalue of floating point type can be converted to a prvalue of another floating point type. If the source value can be exactly represented in the destination type, the result of the conversion is that exact representation. If the source value is between two adjacent destination values, the result of the conversion is an implementation-defined choice of either of those values. Otherwise, the behavior is undefined.
But this is all about the abstract implementation, and we have it quite specific (and this was my mistake – I tried on a specific implementation for an abstract description and made an erroneous conclusion). Since I made an assumption about the range of valid float
values, I took a specific implementation – IEEE 754. And since I took this implementation, then I need to start from it, and not from abstraction.
According to IEEE 754, infinity is part of the type, and therefore its maximum and minimum values are not its delimiters. It's just that everything that lies outside the allowable range is reduced to the desired value according to the rounding rules. So, according to this rule, if you try to represent -1e+50
in a float
, you get -INF
– this is a normal float
value and it is guaranteed when std::numeric_limits<float>::is_iec559
true
*. Those. we have this situation from the standard: "the source value is between two adjacent destination values, the result of the conversion is an implementation-defined choice of either of those values" .
If the above constant is equal to false
, then the answer to the question may be different, but it should also be considered within the framework of a specific implementation, because we have a specific figure in the question, which we cannot consider without specifics from the implementation side.
To sum up: whether such behavior is undefined behavior, whether it is implementation dependent – all this depends on the representation of floating point types and cannot be considered outside of it.
* I did not find the text of the document itself, but many documents that refer to it state this.
In C, by the way, this is more explicitly stated: if the environment supports infinity, then the result is defined and, with the example from the question, it will give guaranteed -inf
. This is described in the C11 standard [5.2.4.2.2/p5]
The minimum range of representable values for a floating type is the most negative finite floating-point number representable in that type through the most positive finite floating point number representable in that type. in addition, if negative infinity is representable in a type, the range of that type is extended to all negative real numbers; likewise, if positive infinity is representable in a type, the range of that type is extended to all positive real numbers.