This is going to be a long post in order to properly explain the issue, so please bear with me. It could also require some knowledge of the internals of the JNA library (v 4.1.0), or ability to examine its source code.
In a few words, we have issues when obtaining pointers to native functions from a 3rd-party component that is written in C. The problematic pointers seem to break the JNA functionality, because of repeating pointer values. The issue is observed repetitively when we execute the JNA bindings as part of the child JVM process within another JVM process.
Background
We are integrating with a 3rd party tool for Windows written in C. The tool manufacturer has provided us with the C header files and a dll that we must inter-operate with trough our Java code. The dll contains structures that expose function pointers, which we are mapping to Java interfaces via JNAerator, I will refer to it as the interop.dll.
The interop.dll communicates with the 3rd party tool (that is pre-installed on the system), thus it is kind of a communication sdk. For testing purposes, we have recently been provided with a stub.dll (again from that manufacturer), which does not require the 3rd party tool to be running, or installed at all. The interop.dll is responsible to decide whether to use the stub or the real 3rd party tool, and automatically chooses the stub if it is present in the bin directory.
So, in any case, we have to map a fixed number of functions exposed by the interop.dll.
To assist in that, the interop.dll would contain the following function:
void* (__cdecl *ObtainInterface)( const char* interfaceName );
and we would map it in Java like this:
public interface ObtainInterface_callback extends Callback {
Pointer apply(String interfaceName);
};
public ObtainInterface_callback ObtainInterface;
This function is used to "extract" another function from the either the 3rd party tool or the stub.dll and then export it to a Java interface by using its pointer value. In other words, we use it to dig trough the target dll's API and map other C functions that we need to Java interfaces. The functions we are extracting are declared within respective C structures and would be declared in the following manner
void (__cdecl *SomeName)(Params.....)
to latter be automatically mapped by JNAerator in a fashion similar to the above ObtainInterface.
So, here is how we obtain the interfaces in our Java code:
Pointer interface1Pointer = ObtainInterface_callback.apply("Interface1");
Interface1 interface1 = new Interface1(interface1Pointer);
Pointer interface2Pointer = ObtainInterface_callback.apply("Interface2");
Interface2 interface2 = new Interface2(interface2Pointer);
Pointer interface3Pointer = ObtainInterface_callback.apply("Interface3");
Interface3 interface3 = new Interface3(interface3Pointer);
where the constructor of Interface1 would look like this (same for Interface2 and Interface3):
public Interface1(Pointer peer) {
super(peer);
read();
}
Note: (in response for technomage's answer) The above code for Interface1, 2 and 3 was automatically generated by JNAerator, in an attempt to map the C struct with functions to a Java object with callbacks.
We have managed to successfully integrate with the interop.dll and the 3rd party tool.
The Problem
When we switch to using the stub dll, we are getting some IllegalStateException coming from the JNA code (CallbackReference.java @ line 122). The problem occurs when we attempt to obtain the third interface Interface3 interface3 = new Interface3(interface3Pointer);
We downloaded the JNA's sources and started debugging trough the code to see what exactly is causing the issue.
The read() method (see constructor of Interface1 above) internally calls a readField() method for all members of the mapped structure. Because all structure members are function pointers, readField produces a Callback instance (as in Pointer.java @line 419), and latter result in a call to the native method long _getPointer(long addr). For those interested, the native method looks like this (I am not really sure this is relevant enough):
dispatch.c, @line 2359
/*
* Class: Native
* Method: _getPointer
* Signature: (J)Lcom/sun/jna/Pointer;
*/
JNIEXPORT jlong JNICALL Java_com_sun_jna_Native__1getPointer
(JNIEnv *env, jclass UNUSED(cls), jlong addr)
{
void *ptr = NULL;
MEMCPY(env, &ptr, L2A(addr), sizeof(ptr));
return A2L(ptr);
}
What we identified there was an issue with the address returned by the above _getPointer call, while running with the stub.dll. Here are the details we captured when debugging:
interface2Pointerhas value402394304 (0x17FC0CC0), (the pointer of the C struct)- The
readFieldmethod discovers 10 function pointers within that struct, the last residing at offset36function10->interface2Pointer+offset=402394304+36=402394340 (0x17FC0CE4).- Finally, there is a call to
_getPointer(interface2Pointer.function10)=_getPointer(402394340)which would return the address of the callback within the struct, currently401814304 (0x17F33320).
The same is repeated for interface3Pointer
interface3Pointer->402397356 (0x17FC18AC)- there are two inner functions with offsets, respectively
0and4, which are retrieved byreadFieldmethod:function1->402397356+0=402397356 (0x17FC18AC)- _getPointer(
interface3Pointer.function1) = _getPointer(402397356) then returns402087408 (0x17F75DF0)
- _getPointer(
function2->402397356+4=402397360 (0x17FC18B0)- _getPointer(
interface3Pointer.function2) = _getPointer(402397360) then returns401814304 (0x17F33320)(!)
- _getPointer(
As you can see, the interface3Pointer.function2 is being assigned the same pointer as interface2Pointer.function10.
Now, the CallbackReference.java internally uses a weak hash map to keep track of callback pointers who have already been assigned to a Java representation, The IllegalStateException is being thrown because that map still has a reference to the already matched pointer (interface2Pointer.function10 @ 401814304), thus it is unable to insert it again and map it to another interface.
I can observe three problems from this point:
- Is it normal for different functions to result in the same pointer? Maybe the
stub.dlluses the same callback for both operations? This is rather surprising, asinterface2Pointer.function10has different signature thaninterface3Pointer.function2. - The weak hash map usage brings a great amount of uncertainty in the above code. If we halt the debugger long enough for a GC call to occur, we can bypass the exception, thus the behavior may not always be reproducible.
- I am unable to determine whether if the GC indeed occurs, we will get the desired behavior. What if that same pointer is wrong in first place? In case of successful assignments I fear we might end up invoking the wrong callback.
The above observations are consistent with subsequent retrials after restarting both the process and the host OS. We are even getting the same address pointers as the ones mentioned here on subsequent executions.
To make things worse, the 3rd party tool manufacturer claims there are no issues with both the interop.dll and the stub.dll that could cause the above behavior.
Update In response to comments, I am adding the signatures of the native functions here:
interface2.function10:
void (__cdecl *function10)( CallbackWithFunction10EventInfo cb, void* userData );
interface3.function1:
void (__cdecl *function1)(CallbackWithNoData cb, void* userData, int value );
interface3.function2:
void (__cdecl *function2)(CallbackWithNoData cb, void* userData);
Signature Note
While the two methods obviously have different types for their first parameter cb, it is not impossible the CallbackWithFunction10EventInfo to be "hierarchically" related to CallbackWithNoData (like some sort of faked inheritance, which is possible in certain circumstances in C). Could something like this impact the returned pointer values?
Some Assertions
We also debugged the pointer values that are returned in case we remove the stub dll and use the working integration, with the interop.dll and the real tool. Our java code is still the same.
interface2Pointer->401508620 (0x17EE890C)function10->interface2Pointer+offset=401508620+36=401508656 (0x17EE8930)._getPointer(interface2Pointer.function10)=_getPointer(401508656)=400857536 (0x17E499C0).interface3Pointer->401508920 (0x17EE8A38)function1->interface3Pointer+offset1=401508920+0=401508920 (0x17EE8A38)._getPointer(interface3Pointer.function1)=_getPointer(401508920)=401018032 (0x17E70CB0).function2->interface3Pointer+offset2=401508920+4=401508924 (0x17EE8A3C)._getPointer(interface3Pointer.function2)=_getPointer(401508924)=401017424 (0x17E70A50)
Obviously, the non-stub addresses are unique, and we get the inter-operation working.
Our Setup
The code is being executed on a virtual machine with Microsoft Windows XP, and resides in a shaded jar. We use JDK/JRE 1.6 and JNA version 4.1.0.
Our test and execution scenarios provide 3 means of executing the Java process that does the interop binding:
- Standalone process - works well with the real tool, silently fails with the
stub.dll - Child process of another JVM process - works well with the real tool, throws the discussed
IllegalStateExceptionwith thestub.dll. - Child process of another JVM proces, but we comment out the
interface2andinterface3bindings. The thing is working correctly
The command line we use to start the child Java process in steps 2 and 3 is:
java -cp our-shaded.jar main.class.package.Application
and when debugging, we add -Xdebug -Xrunjdwp:transport=dt_socket,address=8998,server=y
Update
While just performing some additional assertions, it was worth to examine the pointers returned by the stub.dll in case of a standalone process execution (as in point 1 above). The result was both confusing and gave us some direction. The Standalone process obtained unique pointers, in a similar way as if it was working with the real tool. Thus, the cause might be with the child process and some shared memory or limits to the memory exposed between the native code and the child Java process...
The Question
I would appreciate any clarity on whether the issue is caused by our usage or the stub dll itself (I would blame the latter). We may need to convince the third party manufacturer if there is indeed a problem with their code, otherwise we might not get a chance for a new version of the stub, meaning we should look for a workaround. So, any help in that direction, or workaround tips is welcome.