Example of Code with and without strictfp Modifier

Question

I know this question might seem overly familiar to the community, but I swear I've never been able to reproduce the issue related to this question even once throughout my programming journey.

I understand what the strictfp modifier does and how it ensures full compliance with the IEEE754 standard. However, I've never encountered a situation in practice where the set of values with an extended exponent is used, as described in the official specification.

I've tried using options like -XX:+UseFPUForSpilling to stimulate the use of the FPU block for calculations on my relatively modern processor, but it had no effect.

I even went as far as installing Windows 98 SE on a virtual machine and emulating an Intel Pentium II processor through Bochs, which does not support the SSE instruction set, hoping that the use of the FPU block in this case would be virtually the only option. However, even such an experiment yielded no results.

The essence of the experiment was to take the maximum possible value of the double type and multiply it by 2 to take the intermediate result beyond the permissible range of the double type. Then, I divided the obtained value by 4, and the final result was saved back into a double variable. In theory, I should have gotten some more meaningful result, but in all situations, I ended up with Infinity. In general, I haven't found a single reproducible example on the entire internet (even as of 2024!) that would show different results with and without the use of strictfp. Is it really possible that in almost 30 years of the language's history, there isn't a single example on this topic that clearly demonstrates the difference?

P.S. I'm well aware of Java 17+. All experiments were conducted on earlier versions, where the difference should, in theory, be observable. I installed Java SE 1.3 on the virtual machine.

See stackoverflow.com/a/71181138/5133585 Quote: "The one (and only time) I needed this was reconciliation with an IBM ZSeries" — Sweeper
– Sweeper, Commented Mar 26, 2024 at 9:20
In what context could this have actually been necessary for a mainframe? The point is that sets with an extended exponent are not at all related to increased precision, as the mantissa always continues to occupy either 24 or 53 bits (including the implicit bit of normalization) depending on the data type (float or double). What was it needed for there? Was work being done with extremely large and/or small numerical values? Was this needed to obtain some kind of meaningful result that would differ from zero or Infinity? — Dmytro Kostenko
– Dmytro Kostenko, Commented Mar 26, 2024 at 9:28
The non-strictfp behaviour is only about allowing Java to use certain optimized instructions of the x87 processor that doesn't (fully) comply with IEEE-754 behaviour. It doesn't magically increase the precision of double to allow intermediate results of 2x the maximum double value... It just means that certain operations might have (slightly) different result or a different subnormal value than prescribed by IEEE-754. — Mark Rotteveel
– Mark Rotteveel, Commented Mar 26, 2024 at 10:07
With different I mean that some values might be one or a few epsilon off from the value that IEEE-754 would require for a calculation. — Mark Rotteveel
– Mark Rotteveel, Commented Mar 26, 2024 at 11:14
And it's entirely possible that you will only observe the difference if the code is optimized by HotSpot. — Mark Rotteveel
– Mark Rotteveel, Commented Mar 26, 2024 at 11:20

Dmytro Kostenko · Accepted Answer · 2024-11-23 14:21:17Z

Understanding `strictfp` in Java: A Deep Dive Into JVM Behavior

If you’ve ever worked with floating-point arithmetic in Java, you may have come across the strictfp keyword. It guarantees platform-independent results by strictly adhering to the IEEE 754 floating-point standard. But how does it actually work under the hood? In this post, I’ll walk you through my detailed exploration of strictfp, including examples, assembly code, and insights into the JVM’s behavior on different architectures.

This is not just theoretical – I spent a significant amount of time analyzing the output of a 32-bit JVM on x86 processors, including disassembled JIT-compiled code. This might be one of the few hands-on explanations you’ll find, showcasing real examples of how strictfp affects floating-point calculations.

What Is `strictfp`?

Floating-point types (float and double) in Java are governed by the IEEE 754 standard. The Java Language Specification (JLS §4.2.3) (link) defines two standard value sets for floating-point numbers:

float value set (binary32)
double value set (binary64)

In addition to these, the JVM may support extended-exponent value sets:

float-extended-exponent
double-extended-exponent

Key Differences Between `strictfp` and Default Behavior:

Without strictfp: The JVM can use extended precision for intermediate calculations. For example, on x86 processors, it may use 80-bit floating-point registers. This can lead to platform-specific results due to differences in rounding and precision.
With strictfp: All intermediate calculations are confined to the binary32 (float) or binary64 (double) value sets, ensuring consistency across platforms.

The Experiment: How Does `strictfp` Affect Results?

To explore the effects of strictfp, I tested two examples illustrating overflow and underflow behavior on an x86 processor using a 32-bit JVM. These examples demonstrate how intermediate results behave differently with and without strictfp.

Why Local Variables Were Used Instead of Compile-Time Constants

It’s important to highlight that local variables were deliberately used instead of compile-time constants. This decision was crucial for ensuring that calculations were performed at runtime rather than being optimized away by the compiler.

If compile-time constants (e.g., System.out.println(Double.MIN_VALUE / 2 * 4);) were used directly, the Java compiler would likely compute the result at compile time. During this process, the compiler adheres strictly to the IEEE 754 standard, enforcing binary32 or binary64 precision for intermediate results. This means the calculations would effectively mimic the behavior of strictfp, regardless of whether the modifier is present or not.

By introducing local variables, we force the JVM to defer the computation to runtime. This runtime calculation allows us to observe the effects of extended precision (80-bit x87 registers) or strict IEEE 754 conformance in real-time, as influenced by the presence or absence of the strictfp modifier. Without this approach, the experimental results would not reflect the differences we’re trying to illustrate.

Example 1: Underflow Behavior

public class StrictTest {
    public static void main(String[] args) {
        double secondOperand = 2;
        double thirdOperand = 4;

        System.out.println(Double.MIN_VALUE / secondOperand * thirdOperand);
    }
}

Results:

Without strictfp:

Extended precision (80-bit x87 registers) avoids underflow, preserving the intermediate result:
```
1.0E-323
```
With strictfp:

Intermediate calculations adhere to binary64 precision, causing underflow:
```
0.0
```

Example 2: Overflow Behavior

public class StrictTest {
    public static void main(String[] args) {
        double secondOperand = 2;
        double thirdOperand = 4;

        System.out.println(Double.MAX_VALUE * secondOperand / thirdOperand);
    }
}

Results:

Without strictfp:

Extended precision allows the intermediate result to fit within the 80-bit range, avoiding immediate overflow:
```
8.988465674311579E307
```
With strictfp:

Calculations confined to binary64 precision result in an overflow to positive infinity:
```
Infinity
```

Key Insight:

The use of local variables ensured that these calculations occurred at runtime, allowing us to capture the runtime differences between strictfp and non-strictfp behavior. If compile-time constants had been used, the compiler would have optimized the calculations based on strict IEEE 754 conformance, negating the ability to observe the effects of extended precision on intermediate results. This distinction is critical for reproducibility and understanding the nuances of strictfp.

What Happens Under the Hood?

Using a disassembler (hsdis), I examined the assembly code generated by the JVM to understand how calculations are performed. The goal was to observe how the strictfp modifier impacts floating-point operations at the machine code level.

JVM Options

To replicate the results, the following JVM options were used:

-server -Xcomp -XX:UseSSE=0 -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:CompileCommand=compileonly,StrictTest.main

For the minimal setup required to observe differences, use:

-Xcomp -XX:UseSSE=0

Why These Options Are Necessary

-Xcomp: This option forces the JVM to compile all methods using the Just-In-Time (JIT) compiler immediately. It is mandatory in this experiment because:
- Without -Xcomp, or when using -Xint (interpreted mode), the methods might not be compiled, and the JVM will execute them in interpreted mode. This results in no JIT-compiled assembly output, which is essential for the disassembler (hsdis) to provide meaningful results.
- In interpreted mode, floating-point operations would rely entirely on the bytecode interpreter, making it impossible to observe the low-level differences caused by strictfp.
-XX:UseSSE=0: This disables the use of Streaming SIMD Extensions (SSE) instructions for floating-point operations. Instead, the JVM falls back to the x87 FPU instructions, which utilize 80-bit extended precision registers. This option was critical because:
- By default, modern JVMs on x86 use SSE instructions for floating-point operations, which comply with IEEE 754 by default and do not use extended precision. As a result, there would be no observable difference in behavior with or without strictfp.
- Disabling SSE ensures that the JVM uses x87 FPU instructions, where intermediate results can utilize 80-bit extended precision unless constrained by strictfp. This allows us to demonstrate the impact of strictfp effectively.
-XX:+PrintAssembly: This option outputs the generated assembly code for the compiled methods. Combined with hsdis, it allows for precise observation of how floating-point calculations are executed at the machine level.
-XX:+CompileCommand=compileonly,StrictTest.main: This restricts compilation to the specific method under investigation (StrictTest.main), reducing noise in the assembly output.

By combining these options, the experiment isolates the floating-point operations affected by strictfp and ensures that the results are observable at the assembly level. Without this configuration, the differences introduced by strictfp would remain hidden, or the disassembly would lack the necessary precision.

Assembly Analysis: Without `strictfp`

Here’s the disassembly output when running the underflow example without the strictfp modifier:

0x02f52326: fldl    0x2f522c0   ; Load Double.MIN_VALUE
0x02f5232c: fdivl   0x2f522c8   ; Divide by secondOperand (2.0)
0x02f52332: fmull   0x2f522d0   ; Multiply by thirdOperand (4.0)
0x02f52338: fstpl   (%esp)      ; Store the result for printing

Explanation:

The JVM uses 80-bit extended precision for intermediate calculations, preserving the value beyond the IEEE 754 binary64 precision. As a result, underflow is avoided, and the intermediate result is preserved:
```
Result: 1.0E-323
```

Assembly Analysis: With `strictfp`

When the strictfp modifier is applied, the disassembly for the underflow example includes additional type conversion steps to enforce strict adherence to binary64 precision:

0x02fe2306: fldl    0x2fe22a0   ; Load Double.MIN_VALUE
0x02fe230c: fldt    0x6f4c40a4  ; Extended load
0x02fe2312: fmulp   %st(1)      ; Multiply and store in st(1)
0x02fe2314: fdivl   0x2fe22a8   ; Divide by secondOperand (2.0)
0x02fe231a: fldt    0x6f4c40b0  ; Extended load
0x02fe2320: fmulp   %st(1)      ; Multiply and store in st(1)
0x02fe2322: fstpl   0x18(%esp)  ; Store intermediate result
0x02fe2326: fldl    0x18(%esp)  ; Reload and enforce binary64 rounding
0x02fe232a: fldt    0x6f4c40a4  ; Extended load
0x02fe2330: fmulp   %st(1)      ; Multiply again
0x02fe2332: fmull   0x2fe22b0   ; Multiply by thirdOperand (4.0)
0x02fe2338: fldt    0x6f4c40b0  ; Extended load
0x02fe233e: fmulp   %st(1)      ; Multiply and store in st(1)
0x02fe2340: fstpl   0x20(%esp)  ; Final result stored

Explanation:

The key difference lies in the intermediate rounding and type conversion steps (e.g., fstpl followed by fldl). This forces compliance with the binary64 value set, leading to underflow:
```
Result: 0.0
```

Behavior on Modern 64-Bit JVMs

On modern 64-bit JVMs, the behavior is fundamentally different from 32-bit JVMs due to architectural and implementation changes. Extended precision (80-bit x87 floating-point registers) is not utilized, even when SIMD (SSE or AVX) is explicitly disabled via JVM options. Instead:

Relying on Native Implementations: Calculations appear to rely on native libraries or other internal JVM mechanisms for processing floating-point arithmetic. This can be inferred from the runtime call observed in the disassembled assembly code:
```
0x00000230aeae7e13: callq        0x230aea25820  ; OopMap{off=24}
                                          ;*getstatic out
                                          ; - StrictTest::main@8 (line 6)
                                          ;   {runtime_call}
```
This instruction indicates that instead of performing the floating-point calculation directly via hardware registers, the JVM delegates it to a runtime component. This component likely ensures that intermediate results conform to the binary64 (double) precision standard.
Disabling SSE and AVX Has No Effect: When using the -XX:UseSSE=0 and -XX:UseAVX=0 flags, one might expect the JVM to fall back to utilizing x87 80-bit FPU registers for floating-point operations. However, the runtime behavior remains unchanged, and x87 registers are not employed. Even the additional flag -XX:+UseFPUForSpilling, which should theoretically allow spilling intermediate results to x87 FPU registers, has no noticeable effect on the 64-bit JVM.
Intermediate Results Conform to Binary64 Rules: Regardless of the absence of strictfp, intermediate floating-point calculations adhere to IEEE 754 binary64 standards. This behavior ensures consistent results, simplifying cross-platform development. However, it also means that the potential benefits of extended precision for intermediate calculations (e.g., reducing rounding errors) are not available.
Internal Handling of Floating-Point Arithmetic: The reliance on a runtime component, as indicated by the disassembled code, suggests that floating-point calculations in a 64-bit JVM are heavily abstracted. This aligns with the broader trend of modern JVMs to use platform-independent mechanisms for floating-point arithmetic, reducing reliance on specific hardware features.

Observed Assembly Code

The following disassembled output demonstrates the runtime call used for floating-point calculations on a 64-bit JVM:

0x00000230aeae7e13: callq        0x230aea25820  ; OopMap{off=24}
                                              ;*getstatic out
                                              ; - StrictTest::main@8 (line 6)
                                              ;   {runtime_call}

This instruction explicitly calls into a runtime function for handling floating-point operations, bypassing hardware-level x87 or SIMD (SSE/AVX) capabilities.

Implications

While the strictfp modifier remains important for ensuring cross-platform consistency, its significance is diminished on 64-bit JVMs due to the inherent adherence of intermediate calculations to binary64 standards. This behavior is consistent even when hardware optimizations (like SSE or AVX) are disabled, and no fallback to x87 FPU registers occurs.

This architectural design underscores the JVM's emphasis on platform independence, even at the cost of foregoing hardware-specific optimizations for extended precision.

Diving Into the Java Language Specification

The JLS §4.2.3 (link) provides detailed insights into floating-point value sets. Here are the key points:

Value Sets:
- float and double value sets (binary32, binary64).
- Extended-exponent value sets (broader range of exponents, same precision).
Compliance:
- All JVM implementations must support float and double value sets.
- Extended-exponent value sets are optional but may be used for intermediate results unless restricted by strictfp.

Quote From the JLS:

"The float, float-extended-exponent, double, and double-extended-exponent value sets are not types. It is always correct for an implementation of the Java programming language to use an element of the float value set to represent a value of type float; however, it may be permissible in certain regions of code for an implementation to use an element of the float-extended-exponent value set instead."

System Configuration

Here’s my setup for these experiments:

Processor: Intel Core i7-2960XM Extreme Edition
OS: Windows 10 Enterprise 22H2
JVM: Oracle OpenJDK 1.8.0_431 (32-bit) with hsdis installed.

Notes on Potential Variability

These experiments were conducted exclusively on an x86-64 processor architecture. Results may differ on other architectures (e.g., ARM64), operating systems, or JVM versions/vendors. This variability arises from the differences in how specific architectures and JVM implementations handle floating-point arithmetic and their internal optimizations.

Several factors that could influence results include:

Bytecode Compiler Optimizations: The Java compiler may optimize code differently depending on the runtime context or specific constructs used.
JVM Implementation Details: The behavior may vary based on the JVM vendor or version due to differences in policies around extended-exponent value set support and floating-point arithmetic handling.
OS and Hardware Optimizations: Operating systems and processor microarchitectures may influence how low-level instructions are executed, potentially affecting intermediate results.
JVM Flags: The specific flags used to launch the JVM can have a substantial impact on how calculations are handled. For instance, options like -XX:UseSSE or -XX:+UseFPUForSpilling directly alter the floating-point arithmetic behavior.

Understanding these dependencies is crucial for accurately interpreting experimental results and for reproducing the behavior across different environments.

Compatibility with Older JVM Versions

This analysis extends beyond the JVM versions explicitly mentioned in the earlier sections. I successfully reproduced the observed behavior on 32-bit JVMs starting from J2SE 1.4. Notably, these results were achieved on the Java HotSpot™ Client VM (version 1.4.2_18), which predates the widespread adoption of the SSE instruction set for floating-point calculations.

Key Findings on J2SE 1.4:

Critical Role of the -Xcomp Flag:
- The -Xcomp flag is essential for achieving the desired results on J2SE 1.4. Without this flag, the JVM operates in interpreted mode or mixed mode, which prevents the Just-In-Time (JIT) compiler from generating the assembly-level output necessary for observing the behavior of floating-point operations.
- Enabling -Xcomp ensures that all methods, including those under test, are compiled immediately, exposing the differences in intermediate precision with and without strictfp.
No Need for -XX:UseSSE=0:
- Unlike modern JVMs, the -XX:UseSSE=0 flag is not recognized in J2SE 1.4. This is likely because, during that era, the SSE instruction set was either not fully utilized or had minimal integration into JVM implementations.
- Despite the absence of this flag, the behavior is consistent with what was observed on more recent 32-bit JVMs using x87 FPU instructions, further confirming the reliance on 80-bit extended precision for intermediate floating-point calculations.
Reproducibility on HotSpot-Based JVMs:
- The experiments were conducted on a system running the following configuration:
```
Processor: Intel Core i7-2960XM Extreme Edition
JVM: Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_18-b06)
```
- Results were reproducible, confirming that HotSpot-based JVMs consistently exhibit this behavior when strictfp is absent, provided that the computation is deferred to runtime (e.g., using local variables instead of compile-time constants).

Broader Implications:

These findings reinforce the idea that the behavior described in this post is not exclusive to modern JVM versions. Instead, it aligns with a long-standing design choice in the HotSpot VM to leverage x87 FPU instructions for floating-point arithmetic on 32-bit architectures. This historical consistency ensures that users can reproduce these experiments across various JVM versions, provided that they use the correct configuration and flags (notably, -Xcomp).

This compatibility further emphasizes the importance of understanding both the historical evolution of JVM implementations and the subtle ways in which flags and internal mechanisms influence runtime behavior.

Final Thoughts

This exploration demonstrates the nuanced behavior of strictfp and its impact on floating-point calculations in Java. The examples provided offer a rare glimpse into how intermediate precision is handled by the JVM, supported by real assembly output. By understanding these details, you can make informed decisions about when to use strictfp in your code.

P.S.

Starting from Java SE 17, the strictfp modifier is redundant as strict IEEE 754 adherence became the default and only mode of operation in the JVM.

Update (November 23, 2024): Revisiting How Extended-Exponent Value Sets Are Activated

After a series of additional experiments and thorough analysis, I have reached an important new conclusion about the conditions under which extended-exponent value sets can be utilized. Previously, I claimed that using the -Xcomp flag was mandatory for achieving this behavior on 32-bit JVMs. However, further testing revealed that my earlier understanding was incomplete. Below, I present the refined insights, supported by new experimental evidence and practical examples.

JVM Execution Modes: A Crucial Context

The JVM can operate in three primary execution modes, and understanding these is key to replicating the behavior:

Interpretation Mode (-Xint): All code is executed by the bytecode interpreter. No JIT compilation occurs. In this mode, extended-exponent value sets cannot be used, as the interpreter enforces strict rounding of all intermediate results to either binary32 or binary64, depending on the expected result type.
Compilation Mode (-Xcomp): All code is eagerly compiled by the JIT compiler, bypassing the interpreter entirely. This mode reliably activates extended-exponent value sets for floating-point calculations, as JIT-compiled machine code utilizes the x87 FPU instructions (for 32-bit JVMs).
Mixed Mode (default): Combines interpretation and JIT compilation. Code is initially interpreted, but frequently executed or "hot" code is compiled by the JIT compiler as needed. In this mode, results vary depending on whether a specific block of code is interpreted or compiled.

Key Discovery: JIT Compilation Is the Real Enabler

The earlier assumption that -Xcomp was mandatory stemmed from the fact that it guarantees JIT compilation of all methods. However, my latest findings suggest that it is not the flag itself, but the use of JIT compilation that enables extended-exponent value sets. In mixed mode, it is possible to achieve the same results by ensuring that the relevant code is compiled. Here’s how:

By introducing a high number of iterations for the code block in question, the JVM's built-in heuristics classify it as "hot," triggering JIT compilation.
Once compiled, the JIT-generated machine code leverages the x87 FPU instructions, enabling the use of extended-exponent value sets.

Example: Forcing JIT Compilation Without -Xcomp

The following code demonstrates this principle:

public class StrictTest {
    public static void main(String[] args) {
        double result = 0.0;

        for (int i = 0; i < 1000000; i++) { 
            double secondOperand = 2.0;
            double thirdOperand = 4.0;

            result = Double.MIN_VALUE / secondOperand * thirdOperand;
        }

        System.out.println(result);
    }
}

Here, the repeated execution (1,000,000 iterations) ensures that the loop is compiled by the JIT compiler in mixed mode. As a result, the intermediate calculation avoids underflow, yielding the following output:

1.0E-323

This behavior is identical to what was observed with -Xcomp. It confirms that JIT compilation, not the mode flag, is the crucial factor for enabling extended-exponent calculations.

Historical Compatibility: Testing on Earlier JVM Versions

The extended-exponent value set has been supported since J2SE 1.2, aligning with the introduction of IEEE 754 compliance. Testing across various 32-bit JVM versions revealed the following:

Classic VM (J2SE 1.2–1.3):
- Classic VM (e.g., java version "1.2.2") already supports extended-exponent calculations when JIT compilation is enabled via the symcjit compiler.
- Results are consistent with later HotSpot versions when the same conditions are met.
HotSpot VM (J2SE 1.4 and beyond):
- The introduction of HotSpot VM in J2SE 1.3 as an add-on (and as the default VM in J2SE 1.4) solidified this behavior.
- On J2SE 1.4 and later versions, results were identical across all 32-bit JVMs, confirming that the reliance on x87 FPU instructions remained unchanged.
32-bit JVMs (up to Java SE 9):
- This behavior persisted until Java SE 9, the last version to offer 32-bit JVMs. Beyond this, 32-bit JVM support was deprecated.
64-bit JVMs:
- Extended-exponent value sets are not available on 64-bit JVMs. Testing on J2SE 5.0 and later confirmed that these JVMs adhere strictly to binary64 precision for all intermediate calculations, regardless of flags.

Important Observations on JVM Flags and Versions

Early JVMs (J2SE 1.2–1.5):

The -XX:UseSSE=0 flag is unnecessary and unrecognized in 32-bit JVMs during this period, as SSE instructions were either not utilized or minimally integrated.
Notably, in J2SE 5.0, the -XX:UseSSE=N flag is available exclusively in 64-bit JVMs. In the corresponding 32-bit version, this flag is not supported, as 32-bit JVMs in this era relied solely on x87 FPU instructions for floating-point calculations.
Results for 32-bit JVMs align with x87 FPU usage by default.

JVMs Starting From Java SE 6:

The -XX:UseSSE=0 flag becomes mandatory in 32-bit JVMs to explicitly disable SSE instructions and enable x87 FPU behavior. Without this flag, calculations default to SSE-based precision, resulting in strict binary64 adherence.

64-bit JVMs:

Disabling SSE via -XX:UseSSE=0 has no effect in 64-bit JVMs across all versions. Intermediate results remain confined to binary64, as x87 FPU registers are not utilized.

Broader Implications

This refined understanding clarifies several points about JVM behavior:

Extended-exponent value sets rely on the x87 FPU, which is only available in 32-bit JVMs.
JIT compilation is the critical enabler for accessing this behavior. Without it, the bytecode interpreter enforces strict rounding to binary32 or binary64.
The -Xcomp flag is helpful but not mandatory, provided the relevant code is compiled by the JIT in mixed mode.

Updated Testing Results

I successfully reproduced the behavior across all tested 32-bit JVM versions, from J2SE 1.2 to Java SE 9, provided that JIT compilation was enabled. The table below summarizes the results:

JVM Version          Architecture   Behavior   Notes
-------------------- -------------- ---------- -----------------------------------------
J2SE 1.2.2 (Classic) 32-bit         Success    Enabled by symcjit; no SSE support.
J2SE 1.4 (HotSpot)   32-bit         Success    Default behavior with JIT compilation.
Java SE 6 (HotSpot)  32-bit         Success    Requires -XX:UseSSE=0 to disable SSE.
Java SE 9 (HotSpot)  32-bit         Success    Last version supporting 32-bit architecture.
J2SE 5.0–Java SE 16  64-bit         Failure    x87 FPU not utilized; no extended precision.

Final Thoughts

This update reinforces the nuanced relationship between JVM internals and extended-exponent value sets. By ensuring JIT compilation, it is possible to activate this behavior on 32-bit JVMs across a wide range of versions. This finding highlights the importance of understanding how different execution modes and JVM implementations interact with floating-point arithmetic.

For anyone exploring this area, I recommend replicating the tests with and without -Xcomp and experimenting with "hot code" to better understand the role of JIT compilation in this process.

If I'm reading this correctly, you asked your question 8 months ago, but received no answers. You then spent a bunch of time and effort researching some actual answers, then spent even more time and effort writing up your results here, for all to see. Very nice job.
@SteveSummit Yes, my dear friend. I returned to this question multiple times. And believe me, I couldn’t find a single working example on Stack Overflow or anywhere else on the internet that could be practically reproduced. I had to sit down myself, experiment with different configurations, and work with various architectures and vendors until I arrived at this conclusion.
@SteveSummit You see, I was very upset by the vague and incorrect answers provided. They claimed that without strictfp, greater precision could be achieved, but still within the same value set. How is that even possible? When I started digging deeper, I realized that this was nonsense. Later, I figured out that only the extended exponent is used, not the mantissa, but I couldn’t reproduce it in practice. I’m afraid to say it, but I believe this might be the first detailed guide that allows this experiment to be repeated!
@SteveSummit In the end, this could be interesting when working with very small denormalized values, for example. This isn’t about finance but rather purely scientific calculations, like trying to measure the size of the entire Universe using something like the Planck length. Even then, denormalized numbers within the double range are so small that they don’t have physical counterparts in the real world. They are interesting only from the perspective of computational capabilities, in my opinion.
@SteveSummit The entire point is that in complex expressions, we can temporarily step out of bounds in the context of an intermediate result and "observe" overflow or underflow within the range of the type we ultimately need for the result. If, as a result of a sequence of calculations, we return to the valid range, everything works fine. However, with strictfp, the result of each intermediate operation is strictly converted to float or double, and extended exponent precision is not used in this case.

Dmytro Kostenko · Accepted Answer · 2024-11-25 22:06:37Z

Extending the Experiment: Utilizing Native Libraries to Demonstrate Extended-Exponent Value Sets

This answer builds upon the insights provided in my previous answer here. For a deeper understanding of the foundational concepts and the limitations of extended-exponent value sets in Java, I recommend reviewing that post first.

One aspect of the extended-exponent value sets in Java often overlooked is the potential to demonstrate their behavior through unconventional methods in modern 64-bit environments. While such techniques may seem like a workaround rather than a native JVM feature, they offer a way to visualize the underlying hardware's capabilities. Here, I present an experiment using JNI (Java Native Interface) to achieve what standard JVM configurations on 64-bit systems cannot—explicitly engaging the x87 FPU registers for floating-point arithmetic.

Why Use Native Code?

On 32-bit systems, it is relatively straightforward to leverage extended precision through configurations like disabling SSE and AVX via JVM options. However, these techniques often fail in 64-bit environments, where the JVM strictly adheres to binary64 precision and delegates calculations to software routines, bypassing the hardware's extended precision capabilities.

Using JNI to call a native library bridges this gap by directly invoking hardware instructions through assembly. While this approach deviates from the core principles of Java's platform independence, it demonstrates the practical implementation of extended-exponent value sets for scientific exploration.

Additionally, it is worth noting that the strictfp modifier does not interact with native libraries. The strictfp modifier, a key part of Java’s floating-point precision guarantees, does not influence native libraries. This is because strictfp operates exclusively within the JVM, affecting how bytecode is interpreted or compiled by the JIT. Native libraries, on the other hand, execute precompiled machine code directly on the CPU, bypassing the JVM’s control mechanisms.

Implementation Overview

To demonstrate this, I wrote a native library in C with inline assembly. This library exposes a single function that performs a simple calculation involving multiplication and division using the x87 FPU instructions.

Here’s the Java code:

public class StrictTest2 {
    static {
        System.loadLibrary("NativeFPU");
    }

    private native double performCalculation(double a, double b, double c);

    public static void main(String[] args) {
        StrictTest2 test = new StrictTest2();
        double secondOperand = 2.0;
        double thirdOperand = 4.0;

        double result = test.performCalculation(Double.MAX_VALUE, secondOperand, thirdOperand);
        System.out.println("Result: " + result);
    }
}

Output:

Result: 8.988465674311579E307

And the corresponding native implementation in C:

#include <jni.h>

JNIEXPORT jdouble JNICALL Java_StrictTest2_performCalculation
(JNIEnv *env, jobject obj, jdouble a, jdouble b, jdouble c) {
    double result;
    __asm__ __volatile__(
        "fldl %1\n"               // Load a onto the FPU stack
        "fldl %2\n"               // Load b
        "fmulp %%st, %%st(1)\n"   // Multiply a and b
        "fldl %3\n"               // Load c
        "fdivrp %%st, %%st(1)\n"  // Divide (a * b) by c
        "fstpl %0\n"              // Store the result in result
        : "=m"(result)
        : "m"(a), "m"(b), "m"(c)
    );
    return result;
}

Key Observations

Extended Precision: Unlike the JVM's runtime handling, which confines calculations to binary64, the above assembly explicitly utilizes the 80-bit x87 registers. This means intermediate results leverage the extended precision afforded by the FPU.
Lack of strictfp Influence: Since native libraries execute outside the JVM's direct control, the strictfp modifier has no effect. This is both a limitation and an advantage: while it circumvents Java’s intended behavior, it provides insight into the hardware’s true capabilities.
Cross-Platform Considerations: The experiment's reliance on assembly and native code makes it inherently platform-specific. Writing a similar library for non-x86 architectures, such as ARM, would require adjustments or alternative approaches.

Why This Approach Matters

Using JNI to bypass the JVM’s abstractions serves as a proof of concept. It highlights the theoretical potential of extended-exponent value sets when the JVM itself does not natively expose them in 64-bit environments. While this technique may not align with typical Java programming practices, it opens up possibilities for niche applications, particularly in fields requiring extreme numerical precision.

Limitations and Ethical Considerations

It is crucial to highlight the boundaries of the presented approach:

Platform Dependency: By relying on native libraries and assembly, this solution ties the Java program to specific hardware architectures and operating systems, which contradicts Java's core principle of platform independence.
Strictfp Irrelevance: The strictfp modifier, a key part of Java’s floating-point precision guarantees, does not influence native libraries. This is because strictfp operates exclusively within the JVM, affecting how bytecode is interpreted or compiled by the JIT. Native libraries, on the other hand, execute precompiled machine code directly on the CPU, bypassing the JVM’s control mechanisms.
Security Concerns: Native libraries are susceptible to issues like buffer overflows or memory mismanagement, risks that are inherently absent in Java's managed environment.
Practicality: In most real-world scenarios, the precision offered by binary64 is sufficient, making the use of extended precision largely redundant.

Conclusion

This exploration adds a layer of depth to our understanding of extended-exponent value sets. By creatively employing JNI, we can transcend the limitations of 64-bit JVM environments, shedding light on the often-overlooked capabilities of floating-point hardware. For those venturing into numerical analysis or computational science, such techniques provide valuable insights into how hardware can complement high-level abstractions.

Dmytro Kostenko · Accepted Answer · 2024-11-28 21:06:35Z

This post continues the discussion from Part 1 and Part 2, expanding on the historical rationale behind the strictfp modifier and addressing the specific limitations of the float type when it comes to extended-exponent value sets in Java. We will examine an experiment that highlights these limitations, delve into the architectural decisions of the HotSpot JVM, and explore the broader implications for IEEE 754 compliance.

The Origins and Evolution of `strictfp` in Java

The story of Java's strictfp keyword is deeply tied to the challenges of achieving platform-independent floating-point calculations. When Java first emerged in 1995, it strictly adhered to the IEEE 754 standard for floating-point arithmetic, setting itself apart by offering a consistent computational model across diverse hardware architectures. However, as hardware capabilities evolved and computational demands grew more complex, the introduction of the strictfp keyword in Java 2 Standard Edition (J2SE) 1.2 marked a pivotal moment. It offered developers a way to guarantee consistent, predictable results across varying systems.

This exploration revisits the historical and technical context of `strictfp, focusing on its significance for ensuring consistent floating-point behavior, and addresses how the development of hardware influenced its necessity.

Platform Independence and IEEE 754 in Early Java

Java's core design principle, "write once, run anywhere" (WORA), demanded a runtime environment capable of producing consistent results across a diverse array of hardware and operating systems. Floating-point arithmetic, governed by the IEEE 754 standard, was no exception. From its initial release, Java strictly adhered to the binary32 (float) and binary64 (double) formats specified by IEEE 754, deliberately excluding support for extended-precision formats, such as the 80-bit floating-point format available on Intel x87 FPUs. This design choice was critical to ensuring consistent and predictable behavior across platforms.

The rationale for this decision becomes clear when considering the hardware landscape of the mid-1990s, which was marked by significant diversity and inconsistencies in support for floating-point operations. Key factors included the following:

x86 Processors: The x86 family, widely used in personal computers of the time, showcased a range of capabilities. High-end models, such as the Intel 80486DX and Pentium, integrated hardware Floating-Point Units (FPUs) for efficient floating-point calculations. However, earlier models, such as the Intel 80386 and certain budget-oriented versions of the 80486 (e.g., 80486SX), lacked built-in FPUs. On these systems, floating-point operations had to be emulated entirely in software, requiring the JVM to rely on integer-based Arithmetic Logic Units (ALUs) to implement IEEE 754 compliance.
Other Architectures: Beyond x86, platforms like SPARC, MIPS, PowerPC, and Alpha processors varied widely in their support for FPUs. While many high-performance models included hardware support for floating-point arithmetic, entry-level and cost-sensitive designs often omitted FPUs entirely. This variability presented challenges for maintaining uniform floating-point behavior across platforms.
Legacy Constraints: Although Java was designed as a platform-independent language intended to operate on 32-bit systems or higher, some early 32-bit architectures lacked native hardware support for floating-point calculations. These systems relied heavily on the JVM's ability to emulate IEEE 754 behavior in software. Additionally, while there were experimental efforts to bring Java to 16-bit systems (such as certain MS-DOS environments), such implementations were not part of Java's official specification and fell outside its primary design objectives.

By confining floating-point calculations to the binary32 and binary64 formats, Java ensured consistent results regardless of whether these calculations were executed in hardware or emulated in software. This decision was instrumental in reinforcing Java's promise of platform independence, particularly during an era when hardware diversity created substantial challenges for cross-platform consistency.

The Role of `strictfp` and the Shift to Extended Precision

As Java adoption grew, certain challenges associated with its strict adherence to IEEE 754 specifications became apparent. On platforms utilizing x87 FPUs, intermediate floating-point calculations were often performed in the 80-bit extended precision format. While this extended format provided flexibility by supporting a broader range of exponents, it also introduced inconsistencies across platforms, directly conflicting with Java’s core principle of predictability and platform independence.

With the release of J2SE 1.2, Java’s default behavior permitted the use of extended-exponent value sets for intermediate calculations, leveraging hardware capabilities when available. This meant that intermediate results could temporarily operate within a broader exponent range, as allowed by the extended precision format, while still adhering to the prescribed mantissa lengths for binary32 (float) and binary64 (double). However, to ensure strict conformance to IEEE 754 and to eliminate platform-specific discrepancies, the strictfp keyword was introduced. It provided developers with the ability to enforce a consistent, uniform computational model across all platforms by explicitly restricting calculations to the binary32 and binary64 formats throughout the entire computation process, including intermediates.

The decision to permit extended-exponent value sets reflected a pragmatic acknowledgment of the diverse computational needs of Java developers. Applications in scientific and engineering domains, where numerical edge cases often push standard precision to its limits, benefitted from the ability to handle a wider range of values in intermediate computations. Importantly, this feature was designed to respect the IEEE 754 standard and never involved modifications to the rounding behavior or mantissa precision of the final results. For developers prioritizing consistency and portability over flexibility, the strictfp modifier ensured that Java remained faithful to its promise of "write once, run anywhere," delivering reproducible results across diverse hardware.

By reconciling the potential advantages of extended-exponent value sets with the strict precision requirements of the IEEE 754 standard, Java effectively balanced the competing demands of flexibility and predictability. The introduction of strictfp cemented Java’s commitment to platform independence, enabling developers to navigate the trade-offs of numerical precision and portability with confidence.

The Decline of Software Emulation and the Rise of SIMD

During the late 1990s and early 2000s, hardware trends began to shift dramatically. By this time, most processors supported built-in FPUs, reducing the reliance on software emulation for floating-point calculations. However, x86 processors took this transition a step further with the introduction of Streaming SIMD Extensions (SSE) in Intel's Pentium III processors.

SSE replaced the x87 FPU as the preferred mechanism for floating-point arithmetic on x86 platforms. Unlike x87, which supported 80-bit extended precision, SSE adhered strictly to IEEE 754 binary32 and binary64 standards. This architectural shift simplified floating-point operations, eliminated inconsistencies introduced by extended precision, and aligned perfectly with Java's emphasis on platform independence.

The adoption of SIMD instructions across other architectures (e.g., AVX for x86-64 and NEON for ARM) further cemented this trend. By the time Java SE 17 reintroduced strict IEEE 754 compliance as the default behavior, processors without SIMD instruction sets were effectively obsolete, rendering strictfp redundant.

Revisiting the Role of Legacy Processors

Despite its eventual obsolescence, strictfp played a critical role in bridging the gap between Java's strict platform-independent goals and the realities of early hardware. While some references to legacy systems might suggest broader compatibility than Java officially supported, it’s important to clarify that Java was always a 32-bit (or higher) platform from its inception. Even during the earliest JVM implementations:

Software Emulation on x86: Processors like the Intel 80386 and 80486SX relied entirely on software emulation for floating-point operations. The JVM ensured IEEE 754 compliance programmatically, often at a significant performance cost.
Other Architectures: While SPARC, MIPS, and PowerPC architectures included models with FPUs, the JVM had to accommodate systems that lacked hardware support for floating-point arithmetic, reinforcing the need for consistent behavior.

Legacy 16-bit systems, such as those running MS-DOS, fall outside the scope of Java’s official support. While experimental or custom implementations of JVMs may have existed for these platforms, they were not representative of the language's intended use case or its design goals. This distinction is critical to avoid conflating Java's official capabilities with niche adaptations.

Conclusion: From Necessity to Legacy

The introduction of strictfp was a response to the diverse and evolving hardware landscape of Java's early years. By providing a way to reconcile extended precision with the need for platform-independent behavior, strictfp exemplified Java's commitment to balancing innovation with consistency.

As hardware matured, the relevance of strictfp diminished. With the advent of SIMD instructions and the decline of x87 FPUs, modern processors inherently adhered to IEEE 754 standards, making strictfp redundant. The decision to deprecate the keyword in Java SE 17 reflects this evolution, underscoring the JVM's shift toward hardware-agnostic optimizations.

By understanding the historical context and technical nuances of strictfp, developers can appreciate the intricate balance between platform independence and computational precision—a balance that has defined Java's journey from its inception to the modern era.

Investigating the Handling of `float` with Extended-Exponent Value Sets in Java

The first part of this analysis examined the role of the strictfp modifier and its implications for extended-exponent value sets when working with the double type. However, an equally intriguing question arises when considering the float type. Why does the JVM, particularly HotSpot-based implementations, appear to exclude extended-exponent value sets for float in intermediate calculations, even when strictfp is not applied? This behavior warrants closer investigation, focusing on the interplay between the x87 FPU's internal configuration and the JVM's adherence to Java’s floating-point specifications.

Experiment: Testing `float` with Extended-Exponent Value Sets

To observe how the JVM handles float calculations, the following code was executed:

public class StrictTest {
    public static void main(String[] args) {
        float secondOperand = 2;
        float thirdOperand = 4;

        System.out.println(Float.MAX_VALUE * secondOperand / thirdOperand);
    }
}

Result:

The program outputs Infinity, consistent with the behavior expected when an overflow occurs for the binary32 format (float).

Assembly Code:

The disassembled output of the JVM for the above code reveals the following sequence of operations:

0x023061a6: flds        0x2306140  ;   {section_word}
0x023061ac: fmuls       0x2306144  ;   {section_word}
0x023061b2: fstps       0x10(%esp)
0x023061b6: flds        0x10(%esp)
0x023061ba: fdivs       0x2306148  ;   {section_word}
0x023061c0: fstps       0x14(%esp)
0x023061c4: flds        0x14(%esp)
0x023061c8: fstps       (%esp)  ;*invokevirtual println
                                        ; - StrictTest::main@14 (line 6)

Here, the intermediate results are stored in memory using fstps (store floating-point single-precision) after each operation and then reloaded into the x87 FPU registers using flds (load floating-point single-precision). This behavior is similar to how calculations would be performed if strictfp were explicitly applied, as intermediate results remain confined to the binary32 value set. However, strictfp is not used in this code, raising the question: why does the JVM impose this behavior for float?

The Role of the Control Word (CW) in the x87 FPU

To understand this behavior, we must delve into the x87 FPU's internal workings, particularly its Control Word (CW). This special configuration register governs the precision and rounding modes applied to floating-point operations.

Precision Modes in the x87 FPU

The x87 FPU supports three configurable precision modes that determine how many bits of the mantissa are preserved during calculations:

Single Precision (binary32): 23 explicit bits of mantissa (24 bits with the implicit leading bit).
Double Precision (binary64): 52 explicit bits of mantissa (53 bits with the implicit leading bit).
Extended Precision (80-bit): 64 explicit bits of the mantissa, including the leading bit, which is stored explicitly. Unlike binary32 and binary64 formats, the leading bit is not implicit and is stored as part of the significand to support operations with denormalized numbers and ensure flexibility in intermediate calculations.

The CW specifies which of these precision modes should be applied when rounding intermediate results. Most systems configure the FPU to default to double precision (binary64), aligning with the prevalent use of this format in scientific and general-purpose computations.

Rounding Modes in the x87 FPU

The CW also defines how results are rounded during calculations, with the following options available:

Round to Nearest (default): Rounds to the nearest representable value, resolving ties by rounding to the nearest even value.
Round Down (toward -∞): Always rounds toward negative infinity.
Round Up (toward +∞): Always rounds toward positive infinity.
Round Toward Zero: Truncates the fractional part of the result.

These global settings apply to all operations performed by the x87 FPU and ensure consistent rounding behavior.

The Issue with `float` and Extended-Exponent Value Sets

The x87 FPU defaults to using extended precision for intermediate calculations, which allows for the temporary use of the broader exponent range associated with the 80-bit format. This approach reduces rounding errors and ensures stability for computations involving the double type. However, this creates a conflict for the float type, which adheres strictly to the narrower exponent range and precision of binary32. If intermediate results were allowed to use the extended-exponent value set without restriction, they could exceed the representable range of binary32, leading to inconsistencies with the Java specification.

The JVM’s Solution: Immediate Memory Spills

To resolve this conflict, the JVM adopts a straightforward strategy: immediate memory spills. Each intermediate result in a float calculation is stored in memory immediately after the operation using the fstps instruction. This step enforces rounding to the binary32 value set, ensuring compliance with the IEEE 754 standard. The result is then reloaded into the x87 FPU for the next operation. While this approach sacrifices the potential benefits of extended-exponent value sets, it guarantees that all intermediate results adhere to the Java specification for float.

Why Not Dynamically Adjust the Control Word?

An alternative to memory spills would involve dynamically adjusting the CW to enforce binary32 precision and the corresponding exponent range during operations with float. However, this approach introduces several challenges:

Performance Overhead: Changing the CW is a global operation, affecting all floating-point computations. Frequent adjustments would incur significant overhead, particularly in applications mixing float and double calculations.
Increased Complexity: The JVM would need to track the precision requirements for each operation and modify the CW accordingly, complicating the execution pipeline.
Potential Inconsistencies: Dynamic CW changes could lead to subtle errors, particularly in multithreaded environments where operations are executed concurrently.

The immediate memory spill approach avoids these pitfalls, offering a simple and reliable solution at the cost of computational efficiency.

Key Observations

Intermediate Precision Behavior:
- For double, intermediate results benefit from extended-exponent value sets and are rounded to binary64 only when stored in memory.
- For float, immediate memory spills ensure that all intermediate results remain confined to the binary32 value set.
Compliance vs. Optimization:
- The JVM’s behavior reflects a deliberate trade-off, prioritizing strict adherence to the IEEE 754 standard for cross-platform consistency over potential precision improvements.
Legacy Considerations:
- This behavior has historical significance for systems and applications relying heavily on float. On modern systems, where double is the preferred type for most computations, these limitations are less impactful.

Conclusion: The Legacy of `float` and strictfp

The handling of float in the JVM underscores the careful balance between hardware capabilities and language requirements. While extended-exponent value sets offer theoretical benefits, their application to float would violate Java’s commitment to platform-independent consistency. By adopting immediate memory spills, the JVM enforces the precision and exponent range of binary32, ensuring that float operations produce predictable results across diverse environments.

The exploration of strictfp and floating-point arithmetic in Java provides valuable insights into the challenges of designing a platform-independent language. The deliberate choices made by the JVM emphasize the importance of consistency, even at the cost of optimization, and serve as a reminder of the complexity underlying seemingly straightforward language features.

This concludes our exploration of strictfp and its implications for floating-point arithmetic in Java. By diving deep into the handling of float, double, and the architectural intricacies of the JVM, we’ve uncovered the challenges and decisions that shape Java’s approach to platform independence and predictability. Future discussions may expand on how these principles influence modern JVM optimizations and emerging architectures.

Collectives™ on Stack Overflow

3 Answers 3

Understanding strictfp in Java: A Deep Dive Into JVM Behavior

What Is strictfp?

Key Differences Between strictfp and Default Behavior:

The Experiment: How Does strictfp Affect Results?

Why Local Variables Were Used Instead of Compile-Time Constants

Example 1: Underflow Behavior

Example 2: Overflow Behavior

Key Insight:

What Happens Under the Hood?

JVM Options

Why These Options Are Necessary

Assembly Analysis: Without strictfp

Assembly Analysis: With strictfp

Behavior on Modern 64-Bit JVMs

Observed Assembly Code

Implications

Diving Into the Java Language Specification

Quote From the JLS:

System Configuration

Compatibility with Older JVM Versions

Final Thoughts

P.S.

Update (November 23, 2024): Revisiting How Extended-Exponent Value Sets Are Activated

6 Comments

Why Use Native Code?

Implementation Overview

Key Observations

Why This Approach Matters

Limitations and Ethical Considerations

Conclusion

Comments

The Origins and Evolution of strictfp in Java

Platform Independence and IEEE 754 in Early Java

The Role of strictfp and the Shift to Extended Precision

The Decline of Software Emulation and the Rise of SIMD

Revisiting the Role of Legacy Processors

Conclusion: From Necessity to Legacy

Investigating the Handling of float with Extended-Exponent Value Sets in Java

Experiment: Testing float with Extended-Exponent Value Sets

The Role of the Control Word (CW) in the x87 FPU

The Issue with float and Extended-Exponent Value Sets

The JVM’s Solution: Immediate Memory Spills

Why Not Dynamically Adjust the Control Word?

Key Observations

Conclusion: The Legacy of float and strictfp

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Understanding `strictfp` in Java: A Deep Dive Into JVM Behavior

What Is `strictfp`?

Key Differences Between `strictfp` and Default Behavior:

The Experiment: How Does `strictfp` Affect Results?

Assembly Analysis: Without `strictfp`

Assembly Analysis: With `strictfp`

The Origins and Evolution of `strictfp` in Java

The Role of `strictfp` and the Shift to Extended Precision

Investigating the Handling of `float` with Extended-Exponent Value Sets in Java

Experiment: Testing `float` with Extended-Exponent Value Sets

The Issue with `float` and Extended-Exponent Value Sets

Conclusion: The Legacy of `float` and strictfp