Newest 'ieee-754' Questions

8 votes

1 answer

169 views

Does the MSVC implementation of `signaling_NaN` comply with the the latest IEEE floating-point standard?

As far as I can tell, the MSVC implementation of signaling_NaN does not comply with IEEE 754-2019, the latest version of the IEEE floating-point standard. Unfortunately, I do not have a copy of the ...

tbxfreeware

2,481

asked Jul 19 at 18:53

0 votes

0 answers

43 views

How to make TypeORM auto-fixing all floating-point values according to their db schema type?

Not sure if that is possible at all?... It is typical problem - when value in db is 4.725 but in UI it shows 4.7250000000000005. And there are lot of other value examples which generating such kind of ...

dmitry_bond

459

asked Apr 30 at 11:58

3 votes

2 answers

123 views

How to trigger exactly only one SSE-exception

I've written a little test program that tiggers FPU-exceptions through feraiseexcept(): #include <iostream> #include <cfenv> using namespace std; int main() { auto test = []( int exc,...

Edison von Myosotis

887

asked Mar 12 at 18:24

1 vote

3 answers

207 views

Is Math.sqrt(x) and Math.pow(x, 0.5) equivalent?

In ECMAScript, given a non-negative, finite double x, is the following assertion always true? Math.sqrt(x) === Math.pow(x, 0.5) I know that both Math.sqrt() and Math.pow() are implementation-...

dolmok

178

asked Feb 8 at 7:20

7 votes

1 answer

131 views

Is it always true that x * y = ((x * y) / y) * y under IEEE 754 semantics?

Given two nonzero, finite, double-precision floating point numbers x and y, is it always true that the equality x * y == ((x * y) / y) * y holds under default IEEE 754 semantics? I've searched ...

Hans Brende

8,887

asked Feb 7 at 1:36

1 vote

2 answers

114 views

Java Double Precision - Rounding - %f specifier

Numbers sometimes cannot be expressed exactly when they are represented in double precision or single precision. Of course working with bigdecimal is a solution, I know that. Let's come to my question:...

İlker Deveci

105

asked Jan 14 at 13:34

2 votes

2 answers

113 views

Floating Point: Why does the implicit 1 change the value of the fractional part?

I was reading about the floating point implementation from the comments of a ziglings.org exercise, and I came across this info about it. // Floating further: // // As an example, Zig's f16 is a IEEE ...

Raven King

176

asked Jan 8 at 15:45

11 votes

1 answer

656 views

How to achieve same double to string conversion rounding results in C++ and C#?

I want to convert a double to a string with a given number of decimal places in C++ as well as in C# and I want the results of those conversions to be the same in both languages. Especially C++ ...

Pace

113

asked Jan 3 at 8:52

3 votes

2 answers

147 views

Convert floating-point value to cyclic range?

I'm not sure if I'm using the right terminology, but occasionally I find myself needing to canonicalize a floating-point value to a range in a cyclic manner. (This can be useful, for instance, for ...

Boann

50.3k

asked Dec 4, 2024 at 0:53

2 votes

2 answers

182 views

Why does IEEE 754 define 1 ^ NaN as 1, and why do Java and Javascript violate this?

IEEE 754 defines 1 ^ n as 1, regardless of n. (I'm not paying $106 to confirm this for myself, but this paper cites page 44 from the 2008 standard for this claim.) Most programming languages seem to ...

Isaac King

394

asked Nov 19, 2024 at 18:52

-1 votes

1 answer

106 views

Throw exception when trying to put a number on a float that will be rounded and lose precision

I need to process a CSV. The users define if a column has floats or doubles. The thing is, sometimes they put doubles in a float column, and after it rounds the values and the users only find out ...

Artur Carvalho

7,207

asked Nov 15, 2024 at 16:37

5 votes

3 answers

793 views

Why do we need both a round bit and a sticky bit in IEEE 754 floating point implementations?

In my university lecture we just learnt about IEEE 754 arithmetic using the following table: Guard Round Sticky Result 0 x x Round down (do nothing to significand) 1 1 x Round up 1 0 1 Round up 1 0 0 ...

aeternum

53

asked Oct 21, 2024 at 6:25

3 votes

1 answer

60 views

IEEE Floating-Point Number Bound for (b-a)+a, where 0=<a<=b

Question Given two non-negative numbers a and b, where a is less or equal to b, I care in whether y as per the following algorithm is less or equal to b. Algorithm: x = b-a; y = x+a; Is y<=b in ...

kaisong

181

asked Oct 6, 2024 at 9:52

9 votes

2 answers

174 views

How many values can be represented in a range when using 64-bit floating point type in the most efficient manner

Given a 64-bit floating point type (ieee-754), and a closed range [R0,R1] assuming both R0 and R1 are within the representable range and are not NaN or +/-inf etc. How does one calculate the number of ...

Penny Dreudter

789

asked Sep 7, 2024 at 22:27

0 votes

3 answers

319 views

Windows on ARM: Math is not trapping

I am using this code to test trapping math on Windows 11, VS2022, amd64 and arm64 systems: Godbolt // compile with: cl /O2 /EHa /std:c++20 #include <cfenv> #include <eh.h> #include <...

Ibraim Ganiev

9,480

asked Aug 27, 2024 at 8:15

2 votes

1 answer

114 views

Why does this Java float addition example behave like the mantissa is 24 bits long?

Intro: With Java floats, I noticed that when you add 1.0 to a certain range of tiny negative numbers, it equals 1.0. I decided to investigate this and learned a lot about how floats work in my quest ...

PianoMastR64

427

asked Aug 12, 2024 at 18:25

0 votes

1 answer

93 views

Lossy conversion between long double and double

I was cought by suprise that the following code returns false for gcc 13 and clang 18. Why does this happen? Isn't the number 8.1 representable in both formats? #include <iostream> #include <...

Martin Fehrs

1,165

asked Jul 25, 2024 at 14:56

4 votes

1 answer

262 views

What happens when the integer value with more than 52-bit of mantissa is stored in the double data type?

#include <stdio.h> int main() { double a =92233720368547758071; printf("value=%lf\n", a); int i; char *b = &a; for (i = 0; i < 8; i++) { printf("...

Nalan PandiKumar

358

asked Jun 28, 2024 at 2:00

3 votes

1 answer

132 views

IEEE floating-point rounding in C

I am having trouble to understand a specific IEEE double computation. Take the following C99 program, that runs on a host with IEEE double (8 bytes, 11 bits biased exponent, 52 bits encoded mantissa):...

emacs drives me nuts

4,337

asked Jun 10, 2024 at 15:17

1 vote

0 answers

74 views

Parsing of floating point numbers with error on truncated precision

I am writing a parser for a LIN Description File(LDF). In a LDF file there may be floats. Currently I have a lexer that produces following question-relevant tokens: Number: any character sequence ...

patvax

659

asked May 7, 2024 at 13:33

4 votes

1 answer

85 views

How can I bitwise-cast a 32-bit float into an integer without using typed arrays?

My Arithmetic Expression Compiler, if run in a modern browser, can target both FlatAssembler and GNU Assembler. GNU Assembler doesn't support specifying float values in decimal notation, so my ...

FlatAssembler

840

asked May 1, 2024 at 11:48

0 votes

0 answers

48 views

How to create a file with IEEE single/double format data values using python?

I have a .dat file that is a binary file with a regular structure. Length of one entry - 20 bytes. Each entry contains: date/time (8 bytes, IEEE double format 12/30/1899 12:00 am) humidity value (4 ...

От А До Я

11

asked Apr 16, 2024 at 17:36

2 votes

2 answers

128 views

Losing precision when casting float to double, even for values that have precise binary representations

The canonical example used for explaining the binary vs decimal representation of floating points is the value 0.3: float asFloat = 0.3; // <-- 0.300000012 double asDouble = 0.3; // <-- 0....

Blabba

414

asked Apr 9, 2024 at 8:26

4 votes

3 answers

339 views

Example of Code with and without strictfp Modifier

I know this question might seem overly familiar to the community, but I swear I've never been able to reproduce the issue related to this question even once throughout my programming journey. I ...

Dmytro Kostenko

245

asked Mar 26, 2024 at 9:10

0 votes

2 answers

78 views

How does floating-point addition work in "np.finfo(np.float64).max + 1"?

How does addition work in floating-point for this case: In [6]: np.finfo(np.float64).max + 1 Out[6]: 1.7976931348623157e+308 Why is there no overflow raised?

zell

10.4k

asked Mar 24, 2024 at 5:55

Collectives™ on Stack Overflow

Does the MSVC implementation of `signaling_NaN` comply with the the latest IEEE floating-point standard?

How to make TypeORM auto-fixing all floating-point values according to their db schema type?

How to trigger exactly only one SSE-exception

Is Math.sqrt(x) and Math.pow(x, 0.5) equivalent?

Is it always true that x * y = ((x * y) / y) * y under IEEE 754 semantics?

Java Double Precision - Rounding - %f specifier

Floating Point: Why does the implicit 1 change the value of the fractional part?

How to achieve same double to string conversion rounding results in C++ and C#?

Convert floating-point value to cyclic range?

Why does IEEE 754 define 1 ^ NaN as 1, and why do Java and Javascript violate this?

Throw exception when trying to put a number on a float that will be rounded and lose precision

Why do we need both a round bit and a sticky bit in IEEE 754 floating point implementations?

IEEE Floating-Point Number Bound for (b-a)+a, where 0=<a<=b

How many values can be represented in a range when using 64-bit floating point type in the most efficient manner

Windows on ARM: Math is not trapping

Why does this Java float addition example behave like the mantissa is 24 bits long?

Lossy conversion between long double and double

What happens when the integer value with more than 52-bit of mantissa is stored in the double data type?

IEEE floating-point rounding in C

Parsing of floating point numbers with error on truncated precision

How can I bitwise-cast a 32-bit float into an integer without using typed arrays?

How to create a file with IEEE single/double format data values using python?

Losing precision when casting float to double, even for values that have precise binary representations

Example of Code with and without strictfp Modifier

How does floating-point addition work in "np.finfo(np.float64).max + 1"?

Hot Network Questions