Csaba's coding experiences: Floating Point Equivalence

Recently I've been working on creating a GPU-enabled version of an existing program implementing an image processing algorithm. The program was written in C/C++, and I decided that I could get the job done much faster and better if I rewrite the whole thing in Java. (I just love Java, love working with it, I do it very efficiently, plus the C program was not very big and required some restructuring for the GPU architecture, so I believe using it as a starting point and working with C code would have made my job much harder.)

In order to be able to validate the new code, it was an aim that it implements the same algorithm the exact same way. By comparing the data in the two programs at various steps, I could quickly see if I made a mistake. This led to the need to print exact double values in both C and Java. Then I could just print the data to a file and compare the two versions.

Using printf

In Java, whenever I need to see the "exact" value of a double, I use Double.toString. This method will produce as many digits in the fraction as necessary to differenciate my double value from any other doubles. That is, the string produces by this method is unique for each double, so if two such strings are the same, then the two doubles must also be exactly the same. However, I do not know of a similar function in C. Actually, the only C function I know of that can print doubles is printf. Fortunately, Java also has printf, their syntaxes seem pretty much the same to me, so let's use them!

For the examples below, I will use the number 1/3. It is a nice number which has infinite digits both in base-10 and base-2. Note that to produce this value in code, you need to write "double d=1.0/3;" or something similar; the expression "1/3" results in zero as it is a division between two integers, so the result is also an integer.

Now, if I write printf("%f",d) (or the Java equivalent System.out.printf((Locale)null, "%f",d)), the result is "0.333333" in both languages. Good news, we got the same result, but we should know that the format string "%f" has a parameter specifying the precision, namely the number of digits after the decimal point, and its default value is 6. This means that we lose information. It is quite possible that two doubles differ only in later digits, so to make sure we see everything there is to see, let's specify a sufficiently large precision (I do not know exactly how many digits are needed, but Double.toString usually returns about 15, so 20 should be enough).

printf("%.20f",d) -- C: "0.33333333333333331483" Java: "0.33333333333333330000"

Definitely not the same. Does this mean that "1.0/3" is a different double value in the two languages? No, actually the difference comes from the printf implementations. This realization was my main motivation for this blog post: the format string syntaxes are so similar that one would not expect different behaviour.

It seems that the C printf takes the double value, and returns the decimal number which is the closest to it among the decimals having the specified number of digits after the decimal point. This is in harmony with the manual page of this function:

f, F The double argument is rounded and converted to decimal notation in the style [-]ddd.ddd, where the number of digits after the decimal-point character is equal to the precision specification.

In Java, however, we only get a limited number valuable digits even if we specify a high precision. The documentation specifies the exact behaviour:

If the precision is less than the number of digits which would appear after the decimal point in the string returned by Float.toString(float) or Double.toString(double) respectively, then the value will be rounded using the round half up algorithm. Otherwise, zeros may be appended to reach the precision.

This unfortunately means that we cannot use "%f" for the purpose of comparing doubles. What to do then?

Seeing the bits

Well, the 64-bit floating point representation of doubles is standard, so if we could somehow print this representation directly, that would be reliable. (Note: Java's double is always 64-bit, but C's double does not have to be, at least in theory. But for our purposes let us assume it is.)

Java has a function for just this: Double.doubleToRawLongBits. The long value that this method returns will be represented by the exact same 64 bits as the original double. Of course its value will be something totally different, but nevertheless, the long value is a suitable, unique representation of the double, and it can also be printed precisely, with no shenanigans with rounding or zero-padding: System.out.printf("%d", Double.doubleToRawLongBits(d)); -> "4599676419421066581". If we can get the same 64-bit integral value in C, we're golden. Unfortunately, I do not know of a library function which does this.

I've found the solution here. The key concept is to change the type without numeric casts, and there are two ways to do it. One is through pointers. [Note: I use "long long" for the C data type. I believe it is the one which is most commonly 64-bit, although, once again, the specification does not require it to be.] We first wander into pointerville by taking the address of our double variable: &d; the type of this expression is double*. Then we cast this double pointer to a long long pointer: (long long*)&d. This cast retains the value stored in the variable d, and now we have a long long pointer to it, so all we need to do is dereference it: *((long long*)&d). It is not very pretty, but also not complicated, and it works.

So the solution: printf("%lld", *((long long*)&d)); mind you, "%d" does not work correctly, we need the "ll" length modifier to tell printf about the exact type of the argument, otherwise it will assume it is an int.

The other way is to use a union:

union {
double a;
long long b;
} u;
u.a=d;
printf("%lld", u.b);

The union achieves the same thing that the pointer casts did: it reinterprets an existing 64-bit value as a different type without changing it. It is a bit more verbose, so I just used the first solution myself, but it is worth mentioning, that Java actually implements Double.doubleToRawLongBits using the union method.

Conclusion

The main thing I've learned with this little adventure was that the string formatting facilities of C and Java differ, even though the syntax is similar. One has to be especially careful when trying to print the exact value of a floating point number.

Csaba's coding experiences

2010. október 26., kedd

Floating Point Equivalence - 1. Java vs C

Nincsenek megjegyzések:

Megjegyzés küldése