2010. október 29., péntek

Floating Point Equivalence - 2. GPU vs CPU

As I mentioned, I aimed to achieve equivalent execution between the original algorithm and the one running on GPU. Testing numerical equality between the two main languages was just the first step.

I use OpenCL as the platform for GPU programming. The code which is intended to be executed on the graphics hardware is based on C, so it is quite easy to use and understand. The OpenCL specification defines an optional extension for double precision floating point arithmetics, and this extension is supported by the NVIDIA driver on the GPU I use (GeForce GTX 285). So my plan was to read back partial results from the GPU to my code, and compare them to the corresponding data produced by the original program.

Without further ado, I will just say what I've found out after many hours of meticulous testing. Even when considering the basic arithmetic operations, the result of calculations are not necessarily the same. I got to a piece of code which calculated an expression like "(a-b)*(a-b)+(x-y)*(x-y)", and the result was one bit off when compared to the number produced by C or Java code. At first I thought that the GPU implemented floating point rounding incorrectly, that the result of the expression fell between two neighbouring double values, and the incorrect one was selected. It turned out that this wasn't exactly the case.

I've created a very simple C program to make sure that nothing irrelevant interfered with my test. The GPU code comprises of the following function:
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
__kernel void test(
        double a, double b, double x, double y,
        __global double* c) {

    c[0]=a-b;
    c[1]=(a-b)*(a-b);
    c[2]=x-y;
    c[3]=(x-y)*(x-y);
    c[4]=(a-b)*(a-b)+(x-y)*(x-y);

}
I won't digress by explaining the details of this OpenCL C code. It takes four double arguments, and returns the results of five expressions in the buffer 'c'. The values of the parameters are declared in the C code as follows:
double a=22;
double b=12.020105095868775;
double x=201;
double y=210;
I compared the values returned by the GPU code with those that the C program itself computed, using the non-converting cast to long to ensure that even a single-bit difference becomes visible. These are the results:
a-b      CPU: 4621807799426188662, GPU: 4621807799426188662 -> 1
(a-b)^2  CPU: 4636709024391772619, GPU: 4636709024391772619 -> 1
x-y      CPU: -4602115869219225600, GPU: -4602115869219225600 -> 1
(x-y)^2  CPU: 4635400285215260672, GPU: 4635400285215260672 -> 1
sum      CPU: 4640558254430887142, GPU: 4640558254430887141 -> 0
The last column is the result of simple double equality. The value of each subexpression appears to be the same, except for the very last one. Very strange. Here were two equal numbers, (a-b)^2 and (x-y)^2, and adding them yields different results. This is when I thought that addition doesn't work correctly on the GPU side. To make sure that this is the case, I've planted the values of (a-b)^2 and (x-y)^2 as double literals into the OpenCL code, checked that they are the values that I wanted to add (by returning them in the output buffer, I saw the exact same values as in the table above), and added these. The result was the CPU version of the sum.

What can this mean? Only one thing: although these pairs of numbers appear to be the same when observed from CPU code, they are actually not. Their sums are different. My conclusion is that there are hidden details in these doubles on the GPU side. Internally, the numbers seem to be stored in a precision higher than 64 bits. It is quite possible that the same is true for the CPU, but the internal precisions have to be different. As the double data type is strictly 64-bit, this additional data cannot be observed directly, it is manifested in the results of further operations, when the numerical difference accumulates to an amount which becomes visible when rounding to the nearest floating point value representable in 64 bits.

Conclusion

Different hardware have different internal numerical precisions, even if the data types that they use are equivalent. So under these conditions, one cannot hope that performing the exact same calculations on the exact same input results in the exact same output. There is nothing that can be done about this.

2010. október 26., kedd

Floating Point Equivalence - 1. Java vs C

Recently I've been working on creating a GPU-enabled version of an existing program implementing an image processing algorithm. The program was written in C/C++, and I decided that I could get the job done much faster and better if I rewrite the whole thing in Java. (I just love Java, love working with it, I do it very efficiently, plus the C program was not very big and required some restructuring for the GPU architecture, so I believe using it as a starting point and working with C code would have made my job much harder.)

In order to be able to validate the new code, it was an aim that it implements the same algorithm the exact same way. By comparing the data in the two programs at various steps, I could quickly see if I made a mistake. This led to the need to print exact double values in both C and Java. Then I could just print the data to a file and compare the two versions.

Using printf

In Java, whenever I need to see the "exact" value of a double, I use Double.toString. This method will produce as many digits in the fraction as necessary to differenciate my double value from any other doubles. That is, the string produces by this method is unique for each double, so if two such strings are the same, then the two doubles must also be exactly the same. However, I do not know of a similar function in C. Actually, the only C function I know of that can print doubles is printf. Fortunately, Java also has printf, their syntaxes seem pretty much the same to me, so let's use them!

For the examples below, I will use the number 1/3. It is a nice number which has infinite digits both in base-10 and base-2. Note that to produce this value in code, you need to write "double d=1.0/3;" or something similar; the expression "1/3" results in zero as it is a division between two integers, so the result is also an integer.

Now, if I write printf("%f",d) (or the Java equivalent System.out.printf((Locale)null, "%f",d)), the result is "0.333333" in both languages. Good news, we got the same result, but we should know that the format string "%f" has a parameter specifying the precision, namely the number of digits after the decimal point, and its default value is 6. This means that we lose information. It is quite possible that two doubles differ only in later digits, so to make sure we see everything there is to see, let's specify a sufficiently large precision (I do not know exactly how many digits are needed, but Double.toString usually returns about 15, so 20 should be enough).

printf("%.20f",d) -- C: "0.33333333333333331483"    Java: "0.33333333333333330000"

Definitely not the same. Does this mean that "1.0/3" is a different double value in the two languages? No, actually the difference comes from the printf implementations. This realization was my main motivation for this blog post: the format string syntaxes are so similar that one would not expect different behaviour.

It seems that the C printf takes the double value, and returns the decimal number which is the closest to it among the decimals having the specified number of digits after the decimal point. This is in harmony with the manual page of this function:
f, F   The double argument is rounded and converted to decimal notation in the style [-]ddd.ddd, where the number of digits after the decimal-point character is equal to the precision specification.
In Java, however, we only get a limited number valuable digits even if we specify a high precision. The documentation specifies the exact behaviour:
If the precision is less than the number of digits which would appear after the decimal point in the string returned by Float.toString(float) or Double.toString(double) respectively, then the value will be rounded using the round half up algorithm. Otherwise, zeros may be appended to reach the precision.
This unfortunately means that we cannot use "%f" for the purpose of comparing doubles. What to do then?

Seeing the bits

Well, the 64-bit floating point representation of doubles is standard, so if we could somehow print this representation directly, that would be reliable. (Note: Java's double is always 64-bit, but C's double does not have to be, at least in theory. But for our purposes let us assume it is.)

Java has a function for just this: Double.doubleToRawLongBits. The long value that this method returns will be represented by the exact same 64 bits as the original double. Of course its value will be something totally different, but nevertheless, the long value is a suitable, unique representation of the double, and it can also be printed precisely, with no shenanigans with rounding or zero-padding: System.out.printf("%d", Double.doubleToRawLongBits(d)); -> "4599676419421066581". If we can get the same 64-bit integral value in C, we're golden. Unfortunately, I do not know of a library function which does this.

I've found the solution here. The key concept is to change the type without numeric casts, and there are two ways to do it. One is through pointers. [Note: I use "long long" for the C data type. I believe it is the one which is most commonly 64-bit, although, once again, the specification does not require it to be.] We first wander into pointerville by taking the address of our double variable: &d; the type of this expression is double*. Then we cast this double pointer to a long long pointer: (long long*)&d. This cast retains the value stored in the variable d, and now we have a long long pointer to it, so all we need to do is dereference it: *((long long*)&d). It is not very pretty, but also not complicated, and it works.

So the solution: printf("%lld", *((long long*)&d)); mind you, "%d" does not work correctly, we need the "ll" length modifier to tell printf about the exact type of the argument, otherwise it will assume it is an int.

The other way is to use a union:
union {
  double a;
  long long b;
} u;
u.a=d;
printf("%lld", u.b);
The union achieves the same thing that the pointer casts did: it reinterprets an existing 64-bit value as a different type without changing it. It is a bit more verbose, so I just used the first solution myself, but it is worth mentioning, that Java actually implements Double.doubleToRawLongBits using the union method.

Conclusion

The main thing I've learned with this little adventure was that the string formatting facilities of C and Java differ, even though the syntax is similar. One has to be especially careful when trying to print the exact value of a floating point number.

2010. október 23., szombat

Hello World

I've just spent about one day on a seemingly relatively simple software problem, and I've learnt a couple of things. I realized that it would be useful to publish my experiences. Even if no-one else benefits from them, I could revisit these moments I would otherwise possibly forget. I also incidentally stumbled upon Dustin's blog entry encouraging just this behaviour, and I completely agree with it, so I finally budged. I don't know how often I'll post, but I will try to get myself to write about anything that I find worthy to remember, at least for myself.
PS. Wow, my English is rusty.