0

I was tortured by the floating point comparison issue in Matlab in the previous a couple of weeks.

My code involves tons of a<b or c<=d. What drives me crazy is that

dw = 0.001;
W1 = [0:dw:1];
W2 = [0:dw:1];

Then we have

>>W1(418) = 0.417000000000000
>>W2(418) = 0.417000000000000
>>W1(418)>=W2(418)

ans =

  logical

   0

>>W2(418)>=W1(418)

ans =

  logical

   1

My current way of dealing with this issue is that I define an error term eps0 = 10^(-15). Then, whenever I am expecting to have a<=b, I do a<=b+eps0.

But I was wondering if there is a more generic way of solving this problem? I really appreciate it!

4
  • W1(418) == W2(418) is true. In fact, all(W1==W2) is also true. I don't see how it could be otherwise with your example, as both arrays are constructed identically. Commented Oct 28, 2021 at 16:24
  • @CrisLuengo In my Matlab R2018a, W1(418)==W2(418) return false. Maybe Matlab fixed this problem in a later version? Commented Oct 29, 2021 at 0:29
  • @CrisLuengo or is this caused by different computer hardware? Commented Oct 29, 2021 at 1:26
  • I tried this earlier today in MATLAB online (which is R2021b running on Linux). I have R2017a and R2018b here running on a macOS, and I see all(W1==W2) returning true in both. The only explanation if ~all(W1==W2) is that W1 and W2 were computed in different ways. A computer is expected to do exactly the same thing every time you run the same code with the same inputs. There's no way that hardware could change this result, unless there's a defect in the hardware. But if you have a defect like that, you'd have noticed it before. Commented Oct 29, 2021 at 5:19

2 Answers 2

1

No, there is no generic way to solve floating-point number comparison, each situation is different, and the expected rounding error will change with the situation.

Instead of using eps0 = 10^(-15), use eps(b): a<=b+eps(b). b+eps(b) is the next larger value that can be represented (assuming b is positive). This should take care of one form of rounding error. If rounding errors accumulate, you need a larger margin: a<=b+10*eps(b), for example.

Sign up to request clarification or add additional context in comments.

2 Comments

Be careful about thinking that b+eps(b) is the next representable number. For negative powers of two, it will skip over a representable number. For example, with the IEEE-754 double precision binary format, the ULP of −1 is 2^−52, but −1+2^−53 is representable.
@EricPostpischil That's a really good point, thanks!
0

This is what you told us you did, and here things work out as expected in my version of R2018a:

>> version
ans =
'9.4.0.813654 (R2018a)'
>> dw = 0.001;
>> W1 = [0:dw:1];
>> W2 = [0:dw:1];
>> W1(418)>=W2(418)
ans =
  logical
   1
>> W1(418) = 0.417000000000000;
>> W2(418) = 0.417000000000000;
>> W1(418)>=W2(418)
ans =
  logical
   1
>> W2(418)>=W1(418)
ans =
  logical
   1

So I strongly suspect there is something else you haven't told us about. One way I can see getting your result is if the underlying vectors are not the same type, so that the assignment of 0.417 gets converted differently. E.g., suppose W1 was in fact a single type instead of a double type. Then you would get this result:

>> dw = 0.001;
>> W1 = single([0:dw:1]);
>> W2 = [0:dw:1];
>> W1(418) = 0.417000000000000; % converted to closest SINGLE PRECISION bit pattern
>> W2(418) = 0.417000000000000; % converted to closest DOUBLE PRECISION bit pattern
>> W1(418)>=W2(418)
ans =
  logical
   0
>> W2(418)>=W1(418)
ans =
  logical
   1
>> sprintf('%.60f\n',W1(418))
ans =
    '0.416999995708465576171875000000000000000000000000000000000000
     '
>> sprintf('%.60f\n',W2(418))
ans =
    '0.416999999999999981792342396147432737052440643310546875000000
     '

So here the two numbers W1(418) and W2(418) are in fact different because one is stored as a single precision floating point and the other is stored as a double precision floating point. And the conversion of 0.417 into the binary IEEE floating point bit patterns differs between the two. The single bit pattern turns out to be slightly less in value that the double bit pattern, and the result is perfectly explainable.

So ... are your W1 and W2 vectors actually different types? I suspect they are, which calls into question all of your comparison logic. And as others have pointed out already, floating point arithmetic results in general have to be taken with a grain of salt and comparisons made very carefully (with tolerances, etc.).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.