Why are function pointers and data pointers incompatible in C/C++?

Question

I have read that converting a function pointer to a data pointer and vice versa works on most platforms but is not guaranteed to work. Why is this the case? Shouldn't both be simply addresses into main memory and therefore be compatible?

Undefined in standard C, defined in POSIX. Mind the difference. — ephemient
– ephemient, Commented Feb 7, 2011 at 18:38
I'm a little new at this, but aren't you supposed to do the cast on the right side of the "="? Looks to me like the problem is that you're assigning to a void pointer. But I see that the man page does this, so hopefully someone can educate me. I see examples on the 'net of people casting the return value from dlsym, eg here: daniweb.com/forums/thread62561.html — JasonWoof
– JasonWoof, Commented Feb 7, 2011 at 18:46
Note what POSIX says in the section on Data Types: §2.12.3 Pointer Types. All function pointer types shall have the same representation as the type pointer to void. Conversion of a function pointer to void * shall not alter the representation. A void * value resulting from such a conversion can be converted back to the original function pointer type, using an explicit cast, without loss of information. Note: The ISO C standard does not require this, but it is required for POSIX conformance. — Jonathan Leffler
– Jonathan Leffler, Commented Sep 11, 2012 at 18:19
this is the question in the ABOUT section of this website.. :) :) See you question here — ZooZ
– ZooZ, Commented Jan 21, 2014 at 10:15
@KeithThompson: the world changes — and POSIX does too. What I wrote in 2012 no longer applies in 2018. The POSIX standard changed the verbiage. It is now associated with dlsym() — note the end of the 'Application Usage' section where it says: Note that conversion from a void * pointer to a function pointer as in: fptr = (int (*)(int))dlsym(handle, "my_function"); is not defined by the ISO C standard. This standard requires this conversion to work correctly on conforming implementations. — Jonathan Leffler
– Jonathan Leffler, Commented Jul 31, 2018 at 20:57

Dirk Holsopple · Accepted Answer · 2012-09-10 20:26:03Z

178

An architecture doesn't have to store code and data in the same memory. With a Harvard architecture, code and data are stored in completely different memory. Most architectures are Von Neumann architectures with code and data in the same memory but C doesn't limit itself to only certain types of architectures if at all possible.

answered Sep 10, 2012 at 20:26

Dirk Holsopple

8,8611 gold badge26 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Michael Graczyk Over a year ago

Also, even if code and data are stored in the same place in physical hardware, software and memory access often prevent running data as code without operating system "approval". DEP and the like.

Michael Burr Over a year ago

At least as important as having different address spaces (maybe more important) is that function pointers may have a different representation than data pointers.

caf Over a year ago

You don't even have to have a Harvard architecture to have code and data pointers using different address spaces - the old DOS "Small" memory model did this (near pointers with CS != DS).

PypeBros Over a year ago

even modern processors would struggle with such mixture as the instruction and data cache are typically handled separately, even when the operating system allows you to write code somewhere.

Dietrich Epp Over a year ago

@EricJ. Until you call VirtualProtect, which allows you to mark regions of data as executable.

|

Bo Persson · Accepted Answer · 2012-09-11 06:35:40Z

37

Some computers have (had) separate address spaces for code and data. On such hardware it just doesn't work.

The language is designed not only for current desktop applications, but to allow it to be implemented on a large set of hardware.

It seems like the C language committee never intended void* to be a pointer to function, they just wanted a generic pointer to objects.

The C99 Rationale says:

6.3.2.3 Pointers
C has now been implemented on a wide range of architectures. While some of these architectures feature uniform pointers which are the size of some integer type, maximally portable code cannot assume any necessary correspondence between different pointer types and the integer types. On some implementations, pointers can even be wider than any integer type.

The use of void* (“pointer to void”) as a generic object pointer type is an invention of the C89 Committee. Adoption of this type was stimulated by the desire to specify function prototype arguments that either quietly convert arbitrary pointers (as in fread) or complain if the argument type does not exactly match (as in strcmp). Nothing is said about pointers to functions, which may be incommensurate with object pointers and/or integers.

Note Nothing is said about pointers to functions in the last paragraph. They might be different from other pointers, and the committee is aware of that.

edited Sep 11, 2012 at 6:35

answered Sep 10, 2012 at 20:24

Bo Persson

92.9k31 gold badges153 silver badges211 bronze badges

12 Comments

Edward Strange Over a year ago

The standard could make them compatible without messing with this by simply making the data types the same size and guaranteeing that assigning to one and then back will result in the same value. They do this with void*, which is the only pointer type compatible with everything.

ouah Over a year ago

@CrazyEddie You cannot assign a function pointer to a void *.

Edward Strange Over a year ago

I could be wrong on void* accepting function pointers, but the point remains. Bits are bits. The standard could require that the size of the different types be able to accomodate the data from each other and the assignment would be guaranteed to work even if they are used in different memory segments. The reason this incompatibility exists is that this is NOT guaranteed by the standard and so data can be lost in the assignment.

Robᵩ Over a year ago

But requiring sizeof(void*) == sizeof( void(*)() ) would waste space in the case where function pointers and data pointers are different sizes. This was a common case in the 80's, when the first C standard was written.

John Bode Over a year ago

@RichardChambers: The different address spaces may also have different address widths, such as an Atmel AVR that uses 16 bits for instructions and 8 bits for data; in that case, it would be hard converting from data (8 bit) to function (16 bit) pointers and back again. C's supposed to be easy to implement; part of that ease comes from leaving data and instruction pointers incompatible with each other.

|

caf · Accepted Answer · 2012-09-11 00:54:23Z

33

For those who remember MS-DOS, Windows 3.1 and older the answer is quite easy. All of these used to support several different memory models, with varying combinations of characteristics for code and data pointers.

So for instance for the Compact model (small code, large data):

sizeof(void *) > sizeof(void(*)())

and conversely in the Medium model (large code, small data):

sizeof(void *) < sizeof(void(*)())

In this case you didn't have separate storage for code and date but still couldn't convert between the two pointers (short of using non-standard __near and __far modifiers).

Additionally there's no guarantee that even if the pointers are the same size, that they point to the same thing - in the DOS Small memory model, both code and data used near pointers, but they pointed to different segments. So converting a function pointer to a data pointer wouldn't give you a pointer that had any relationship to the function at all, and hence there was no use for such a conversion.

edited Sep 11, 2012 at 0:54

caf

241k42 gold badges343 silver badges479 bronze badges

answered Sep 10, 2012 at 21:04

Tomek

4,6671 gold badge23 silver badges22 bronze badges

5 Comments

ruakh Over a year ago

Re: "converting a function pointer to a data pointer wouldn't give you a pointer that had any relationship to the function at all, and hence there was no use for such a conversion": This doesn't entirely follow. Converting an int* to a void* give you a pointer that you can't really do anything with, but it's still useful to be able to perform the conversion. (This is because void* can store any object pointer, so can be used for generic algorithms that don't need to know what type they hold. The same thing could be useful for function pointers as well, if it were allowed.)

caf Over a year ago

@ruakh: In the case of converting the int * to void *, the void * is guaranteed to at least point to the same object as the original int * did - so this is useful for generic algorithms that access the pointed-to object, like int n; memcpy(&n, src, sizeof n);. In the case where converting a function pointer to a void * doesn't yield a pointer pointing at the function, it isn't useful for such algorithms - the only thing you could do is convert the void * back to a function pointer again, so you might as well just use a union containing a void * and function pointer.

ruakh Over a year ago

@caf: Fair enough. Thanks for pointing that out. And for that matter, even if the void* did point to the function, I suppose it would be a bad idea for people to pass it to memcpy. :-P

Jonathan Leffler Over a year ago

Copied from above: Note what POSIX says in Data Types: §2.12.3 Pointer Types. All function pointer types shall have the same representation as the type pointer to void. Conversion of a function pointer to void * shall not alter the representation. A void * value resulting from such a conversion can be converted back to the original function pointer type, using an explicit cast, without loss of information. Note: The ISO C standard does not require this, but it is required for POSIX conformance.

Deduplicator Over a year ago

@caf If it just should be passed through to some callback which knows the proper type, I'm only interested in round-trip safety, not any other relationship those converted values might possibly have.

Jerry Coffin · Accepted Answer · 2011-02-07 18:00:56Z

27

Pointers to void are supposed to be able to accommodate a pointer to any kind of data -- but not necessarily a pointer to a function. Some systems have different requirements for pointers to functions than pointers to data (e.g, there are DSPs with different addressing for data vs. code, medium model on MS-DOS used 32-bit pointers for code but only 16-bit pointers for data).

answered Feb 7, 2011 at 18:00

Jerry Coffin

494k83 gold badges656 silver badges1.2k bronze badges

8 Comments

Manav Over a year ago

but then should'nt the dlsym () function be returning something other than a void *. I mean, if the void * is not big enough for the function pointer, arn't we already fubared?

Jerry Coffin Over a year ago

@Knickerkicker: Yes, probably. If memory serves, the return type from dlsym was discussed at length, probably 9 or 10 years ago, on the OpenGroup's email list. Offhand, I don't remember what (if anything) came of it though.

Manav Over a year ago

you're right. This seems a fairly nice (although outdated) summary of your point.

user764357 Over a year ago

+1 for answering the question before it was asked

Jerry Coffin Over a year ago

@LegoStormtroopr: Interesting how 21 people agree with the idea of up-voting, but only about 3 have actually done so. :-)

|

Maxim Egorushkin · Accepted Answer · 2023-12-09 23:37:34Z

14

In addition to what is already said here, POSIX requires pointers to functions with external linkage to be convertible to void* in the API of dlsym:

The ISO C standard does not require that pointers to functions can be cast back and forth to pointers to data. Indeed, the ISO C standard does not require that an object of type void* can hold a pointer to a function. Implementations supporting the XSI extension, however, do require that an object of type void* can hold a pointer to a function. The result of converting a pointer to a function into a pointer to another data type (except void*) is still undefined, however. Note that compilers conforming to the ISO C standard are required to generate a warning if a conversion from a void* pointer to a function pointer is attempted as in:

fptr = (int (*)(int))dlsym(handle, "my_function");

Due to the problem noted here, a future version may either add a new function to return function pointers, or the current interface may be deprecated in favor of two new functions: one that returns data pointers and the other that returns function pointers.

In other words:

ISO C standard does not require that pointers to functions can be cast back and forth to pointers to data. Meaning that such casts are well-formed C code, but with non-portable and undefined effects, hence the requirement for compiler warning.
POSIX requires that for dlsym.
System V platform-specific ABIs (Mac OS, Linux, e.g. System V ABI for AMD64) in its ABI type Pointer definition require that both function and data pointers have the same specific size and alignment.

edited Dec 9, 2023 at 23:37

answered Sep 10, 2012 at 20:38

Maxim Egorushkin

138k18 gold badges201 silver badges293 bronze badges

4 Comments

gexicide Over a year ago

does that mean that using dlsym to get the address of a function is currently unsafe? Is there currently a safe way to do it?

Maxim Egorushkin Over a year ago

It means that currently POSIX requires from a platform ABI that both function and data pointers can be safely cast to void* and back .

David Hammen Over a year ago

@gexicide It means that implementations that are POSIX compliant have made an extension to the language, giving an implementation-defined meaning to what is undefined behavior per the standard intself. It's even listed as one of the common extensions to the C99 standard, section J.5.7 Function pointer casts.

Maxim Egorushkin Over a year ago

@DavidHammen It is not an extension to the language, rather a new extra requirement. C doesn't require void* to be compatible with a function pointer, whereas POSIX does.

Remy Lebeau · Accepted Answer · 2023-10-13 20:09:46Z

10

C++11 has a solution to the long-standing mismatch between C/C++ and POSIX with regard to dlsym(). One can use reinterpret_cast to convert a function pointer to/from a data pointer so long as the implementation supports this feature.

From the standard, 5.2.10 para. 8:

converting a function pointer to an object pointer type or vice versa is conditionally-supported.

1.3.5 defines "conditionally-supported" as a:

program construct that an implementation is not required to support.

edited Oct 13, 2023 at 20:09

Remy Lebeau

609k36 gold badges516 silver badges875 bronze badges

answered Sep 10, 2012 at 23:01

David Hammen

33.3k9 gold badges64 silver badges110 bronze badges

5 Comments

Konrad Rudolph Over a year ago

One can, but one shouldn’t. A conforming compiler must generate a warning for that (which in turn should trigger an error, cf. -Werror). A better (and non-UB) solution is to retrieve a pointer to the object returned by dlsym (i.e. void**) and convert that to a pointer to function pointer. Still implementation-defined but no longer cause for a warning/error.

MSalters Over a year ago

@KonradRudolph: Disagree. The "conditionally-supported" wording was specifically written to allow dlsym and GetProcAddress to compile without warning.

Konrad Rudolph Over a year ago

@MSalters What do you mean, “disagree”? Either I’m right or wrong. The dlsym documentation explicitly says that “compilers conforming to the ISO C standard are required to generate a warning if a conversion from a void * pointer to a function pointer is attempted”. This doesn’t leave much room for speculation. And GCC (with -pedantic) does warn. Again, no speculation possible.

Konrad Rudolph Over a year ago

Follow-up: I think now I understand. It’s not UB. It’s implementation-defined. I’m still unsure whether the warning must be generated or not – probably not. Oh well.

MSalters Over a year ago

@KonradRudolph: I disagreed with your "shouldn't", which is an opinion. The answer specifically mentioned C++11, and I was a member of the C++ CWG at the time the issue was addressed. C99 indeed has different wording, conditionally-supported is a C++ invention.

Graham Borland · Accepted Answer · 2011-02-07 18:00:21Z

7

Depending on the target architecture, code and data may be stored in fundamentally incompatible, physically distinct areas of memory.

answered Feb 7, 2011 at 18:00

Graham Borland

60.8k21 gold badges144 silver badges184 bronze badges

3 Comments

Manav Over a year ago

'physically distinct' I understand, but can you elaborate more on the 'fundamentally incompatible' distinction. As I said in the question, isn't a void pointer supposed to as large as any pointer type - or is that a wrong presumption on my part.

ephemient Over a year ago

@KnickerKicker: void * is large enough to hold any data pointer, but not necessarily any function pointer.

SSpoke Over a year ago

back to the future :P

Martin Beckett · Accepted Answer · 2011-02-07 18:00:58Z

5

undefined doesn't necessarily mean not allowed, it can mean that the compiler implementor has more freedom to do it how they want.

For instance it may not be possible on some architectures - undefined allows them to still have a conforming 'C' library even if you can't do this.

answered Feb 7, 2011 at 18:00

Martin Beckett

96.3k28 gold badges196 silver badges268 bronze badges

Comments

R.. GitHub STOP HELPING ICE · Accepted Answer · 2011-02-07 20:16:56Z

Another solution:

Assuming POSIX guarantees function and data pointers to have the same size and representation (I can't find the text for this, but the example OP cited suggests they at least intended to make this requirement), the following should work:

double (*cosine)(double);
void *tmp;
handle = dlopen("libm.so", RTLD_LAZY);
tmp = dlsym(handle, "cos");
memcpy(&cosine, &tmp, sizeof cosine);

This avoids violating the aliasing rules by going through the char [] representation, which is allowed to alias all types.

Yet another approach:

union {
    double (*fptr)(double);
    void *dptr;
} u;
u.dptr = dlsym(handle, "cos");
cosine = u.fptr;

But I would recommend the memcpy approach if you want absolutely 100% correct C.

Edward Strange · Accepted Answer · 2012-09-10 20:35:22Z

5

They can be different types with different space requirements. Assigning to one can irreversibly slice the value of the pointer so that assigning back results in something different.

I believe they can be different types because the standard doesn't want to limit possible implementations that save space when it's not needed or when the size could cause the CPU to have to do extra crap to use it, etc...

edited Sep 10, 2012 at 20:35

answered Sep 10, 2012 at 20:24

Edward Strange

41k9 gold badges79 silver badges127 bronze badges

Comments

R.. GitHub STOP HELPING ICE · Accepted Answer · 2011-02-07 19:11:30Z

4

The only truly portable solution is not to use dlsym for functions, and instead use dlsym to obtain a pointer to data that contains function pointers. For example, in your library:

struct module foo_module = {
    .create = create_func,
    .destroy = destroy_func,
    .write = write_func,
    /* ... */
};

and then in your application:

struct module *foo = dlsym(handle, "foo_module");
foo->create(/*...*/);
/* ... */

Incidentally, this is good design practice anyway, and makes it easy to support both dynamic loading via dlopen and static linking all modules on systems that don't support dynamic linking, or where the user/system integrator does not want to use dynamic linking.

answered Feb 7, 2011 at 19:11

R.. GitHub STOP HELPING ICE

217k36 gold badges404 silver badges744 bronze badges

3 Comments

Manav Over a year ago

Nice! While I agree this does seem more maintainable, it is still not obvious (to me) how I hammer on static linking on top of this. Can you elaborate?

R.. GitHub STOP HELPING ICE Over a year ago

If each module has its own foo_module structure (with unique names), you can simply create an extra file with an array of struct { const char *module_name; const struct module *module_funcs; } and a simple function to search this table for the module you want to "load" and return the right pointer, then use this in place of dlopen and dlsym.

user877329 Over a year ago

@R.. True, but it adds maintenance cost by having to maintain the module structure.

Andrew Sun · Accepted Answer · 2016-11-27 05:46:22Z

A modern example of where function pointers can differ in size from data pointers: C++ class member function pointers

Directly quoted from https://blogs.msdn.microsoft.com/oldnewthing/20040209-00/?p=40713/

class Base1 { int b1; void Base1Method(); };
class Base2 { int b2; void Base2Method(); };
class Derived : public Base1, Base2 { int d; void DerivedMethod(); };
There are now two possible this pointers.

A pointer to a member function of Base1 can be used as a pointer to a member function of Derived, since they both use the same this pointer. But a pointer to a member function of Base2 cannot be used as-is as a pointer to a member function of Derived, since the this pointer needs to be adjusted.

There are many ways of solving this. Here's how the Visual Studio compiler decides to handle it:

A pointer to a member function of a multiply-inherited class is really a structure.
[Address of function]
[Adjustor]
The size of a pointer-to-member-function of a class that uses multiple inheritance is the size of a pointer plus the size of a size_t.

tl;dr: When using multiple inheritance, a pointer to a member function may (depending on compiler, version, architecture, etc) actually be stored as

struct { 
    void * func;
    size_t offset;
}

which is obviously larger than a void *.

Barmar · Accepted Answer · 2012-09-11 21:18:54Z

2

On most architectures, pointers to all normal data types have the same representation, so casting between data pointer types is a no-op.

However, it's conceivable that function pointers might require a different representation, perhaps they're larger than other pointers. If void* could hold function pointers, this would mean that void*'s representation would have to be the larger size. And all casts of data pointers to/from void* would have to perform this extra copy.

As someone mentioned, if you need this you can achieve it using a union. But most uses of void* are just for data, so it would be onerous to increase all their memory use just in case a function pointer needs to be stored.

answered Sep 11, 2012 at 21:18

Barmar

789k57 gold badges554 silver badges669 bronze badges

Comments

phorgan1 · Accepted Answer · 2015-12-11 18:27:39Z

-1

I know that this hasn't been commented on since 2012, but I thought it would be useful to add that I do know an architecture that has very incompatible pointers for data and functions since a call on that architecture checks privilege and carries extra information. No amount of casting will help. It's The Mill.

answered Dec 11, 2015 at 18:27

phorgan1

1,7541 gold badge18 silver badges18 bronze badges

1 Comment

Manuel Jacob Over a year ago

This answer is wrong. You can for example convert a function pointer to a data pointer and read from it (if you have permissions to read from that address, as usual). The result makes as much sense as it does e.g. on x86.

Collectives™ on Stack Overflow

Why are function pointers and data pointers incompatible in C/C++?

14 Answers 14

8 Comments

12 Comments

5 Comments

8 Comments

4 Comments

5 Comments

3 Comments

Comments

Comments

Comments

3 Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

14 Answers 14

8 Comments

12 Comments

5 Comments

8 Comments

4 Comments

5 Comments

3 Comments

Comments

Comments

Comments

3 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related