Skip to main content
Filter by
Sorted by
Tagged with
8 votes
1 answer
169 views

As far as I can tell, the MSVC implementation of signaling_NaN does not comply with IEEE 754-2019, the latest version of the IEEE floating-point standard. Unfortunately, I do not have a copy of the ...
tbxfreeware's user avatar
  • 2,481
0 votes
0 answers
43 views

Not sure if that is possible at all?... It is typical problem - when value in db is 4.725 but in UI it shows 4.7250000000000005. And there are lot of other value examples which generating such kind of ...
dmitry_bond's user avatar
3 votes
2 answers
123 views

I've written a little test program that tiggers FPU-exceptions through feraiseexcept(): #include <iostream> #include <cfenv> using namespace std; int main() { auto test = []( int exc,...
Edison von Myosotis's user avatar
1 vote
3 answers
207 views

In ECMAScript, given a non-negative, finite double x, is the following assertion always true? Math.sqrt(x) === Math.pow(x, 0.5) I know that both Math.sqrt() and Math.pow() are implementation-...
dolmok's user avatar
  • 178
7 votes
1 answer
131 views

Given two nonzero, finite, double-precision floating point numbers x and y, is it always true that the equality x * y == ((x * y) / y) * y holds under default IEEE 754 semantics? I've searched ...
Hans Brende's user avatar
  • 8,887
1 vote
2 answers
114 views

Numbers sometimes cannot be expressed exactly when they are represented in double precision or single precision. Of course working with bigdecimal is a solution, I know that. Let's come to my question:...
İlker Deveci's user avatar
2 votes
2 answers
113 views

I was reading about the floating point implementation from the comments of a ziglings.org exercise, and I came across this info about it. // Floating further: // // As an example, Zig's f16 is a IEEE ...
Raven King's user avatar
11 votes
1 answer
656 views

I want to convert a double to a string with a given number of decimal places in C++ as well as in C# and I want the results of those conversions to be the same in both languages. Especially C++ ...
Pace's user avatar
  • 113
3 votes
2 answers
147 views

I'm not sure if I'm using the right terminology, but occasionally I find myself needing to canonicalize a floating-point value to a range in a cyclic manner. (This can be useful, for instance, for ...
Boann's user avatar
  • 50.3k
2 votes
2 answers
182 views

IEEE 754 defines 1 ^ n as 1, regardless of n. (I'm not paying $106 to confirm this for myself, but this paper cites page 44 from the 2008 standard for this claim.) Most programming languages seem to ...
Isaac King's user avatar
-1 votes
1 answer
106 views

I need to process a CSV. The users define if a column has floats or doubles. The thing is, sometimes they put doubles in a float column, and after it rounds the values and the users only find out ...
Artur Carvalho's user avatar
5 votes
3 answers
793 views

In my university lecture we just learnt about IEEE 754 arithmetic using the following table: Guard Round Sticky Result 0 x x Round down (do nothing to significand) 1 1 x Round up 1 0 1 Round up 1 0 0 ...
aeternum's user avatar
3 votes
1 answer
60 views

Question Given two non-negative numbers a and b, where a is less or equal to b, I care in whether y as per the following algorithm is less or equal to b. Algorithm: x = b-a; y = x+a; Is y<=b in ...
kaisong's user avatar
  • 181
9 votes
2 answers
174 views

Given a 64-bit floating point type (ieee-754), and a closed range [R0,R1] assuming both R0 and R1 are within the representable range and are not NaN or +/-inf etc. How does one calculate the number of ...
Penny Dreudter's user avatar
0 votes
3 answers
319 views

I am using this code to test trapping math on Windows 11, VS2022, amd64 and arm64 systems: Godbolt // compile with: cl /O2 /EHa /std:c++20 #include <cfenv> #include <eh.h> #include <...
Ibraim Ganiev's user avatar
2 votes
1 answer
114 views

Intro: With Java floats, I noticed that when you add 1.0 to a certain range of tiny negative numbers, it equals 1.0. I decided to investigate this and learned a lot about how floats work in my quest ...
PianoMastR64's user avatar
0 votes
1 answer
93 views

I was cought by suprise that the following code returns false for gcc 13 and clang 18. Why does this happen? Isn't the number 8.1 representable in both formats? #include <iostream> #include <...
Martin Fehrs's user avatar
  • 1,165
4 votes
1 answer
262 views

#include <stdio.h> int main() { double a =92233720368547758071; printf("value=%lf\n", a); int i; char *b = &a; for (i = 0; i < 8; i++) { printf("...
Nalan PandiKumar's user avatar
3 votes
1 answer
132 views

I am having trouble to understand a specific IEEE double computation. Take the following C99 program, that runs on a host with IEEE double (8 bytes, 11 bits biased exponent, 52 bits encoded mantissa):...
emacs drives me nuts's user avatar
1 vote
0 answers
74 views

I am writing a parser for a LIN Description File(LDF). In a LDF file there may be floats. Currently I have a lexer that produces following question-relevant tokens: Number: any character sequence ...
patvax's user avatar
  • 659
4 votes
1 answer
85 views

My Arithmetic Expression Compiler, if run in a modern browser, can target both FlatAssembler and GNU Assembler. GNU Assembler doesn't support specifying float values in decimal notation, so my ...
FlatAssembler's user avatar
0 votes
0 answers
48 views

I have a .dat file that is a binary file with a regular structure. Length of one entry - 20 bytes. Each entry contains: date/time (8 bytes, IEEE double format 12/30/1899 12:00 am) humidity value (4 ...
От А До Я's user avatar
2 votes
2 answers
128 views

The canonical example used for explaining the binary vs decimal representation of floating points is the value 0.3: float asFloat = 0.3; // <-- 0.300000012 double asDouble = 0.3; // <-- 0....
Blabba's user avatar
  • 414
4 votes
3 answers
339 views

I know this question might seem overly familiar to the community, but I swear I've never been able to reproduce the issue related to this question even once throughout my programming journey. I ...
Dmytro Kostenko's user avatar
0 votes
2 answers
78 views

How does addition work in floating-point for this case: In [6]: np.finfo(np.float64).max + 1 Out[6]: 1.7976931348623157e+308 Why is there no overflow raised?
zell's user avatar
  • 10.4k

1
2 3 4 5
31