How to implement Object Oriented Dynamic Dispatch in LLVM?

Question

I am trying to make a toy compiler for an object oriented language which supports dynamic dispatch and simple inheritance. Meaning any class Child that extends a parent can be used whenever a Parent class is declared. It inherits all its fields and its methods and it may override methods.

Up to now I have been considering implementing virtual function tables and making sure the memory layouts of Parent and Child classes are as similar as possible. Here is a small example in C to illustrate what I mean:

#include <stdlib.h>

typedef struct {
    struct ParentVTable *_vtable;
    int inheritedField;
} Parent;

struct ParentVTable {
    void (*inheritedMethod)(Parent *, int);
};

void Parent_inheritedMethod(Parent *self, int b) {
    b = 0;
}

struct ParentVTable ParentVTable_inst = {
    .inheritedMethod = &Parent_inheritedMethod
};

void Parent_init(Parent *self) {
    self->_vtable = &ParentVTable_inst;
    self->inheritedField = 42;
}

Parent *Parent_new(void) {
    Parent *self = (Parent*)malloc(sizeof(Parent));
    Parent_init(self);
    return self;
}

typedef struct {
    struct ChildVTable *_vtable;
    int inheritedField;
    int newField;
} Child;

struct ChildVTable {
    void (*inheritedMethod)(Child *, int);
    int (*newMethod)(Child *, int);
};

int Child_newMethod(Child *self, int i) {
    return i + self->inheritedField;
}

struct ChildVTable ChildVTable_inst = {
    .inheritedMethod = (void (*)(Child *, int)) Parent_inheritedMethod,
    .newMethod = &Child_newMethod
};

void Child_init(Child *self) {
    Parent_init((Parent *) self);
    self->_vtable = &ChildVTable_inst;
    self->newField = 0;
}

Child *Child_new(void) {
    Child *self = (Child*)malloc(sizeof(Child));
    Child_init(self);
    return self;
}

int main() {
    Parent *p = (Parent*) Child_new();
    return 0;
}

As you can see in the main method, a Child class can be used as a Parent class provided that it is cast as a Parent. Using clang this cast translates in LLVM into a bitcast operation without which the llvm is not valid (the llvm code does not compile). Here is the corresponding piece of code:

define i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca %struct.Parent*, align 8
  store i32 0, i32* %1, align 4
  %3 = call %struct.Child* @Child_new()
  %4 = bitcast %struct.Child* %3 to %struct.Parent*
  store %struct.Parent* %4, %struct.Parent** %2, align 8
  ret i32 0
}

Up to there everything is fine. The thing is that in the language I am trying to make, as well as in most object oriented languages that implement inheritance, this cast that is done in C is implicit. I have no way to see, at compile time, that a cast is required. My question, thus, is how do I know when I have to bitcast an llvm variable for example before assigning it or before passing it as an argument to call etc. Assuming this is only known at runtime. Am I missing something, should I simply bitcast every time to make sure the correct 'objects' are passed around in the code? Any help would be appreciated!

arnt · Accepted Answer · 2021-05-18 09:48:57Z

2

What you're seeing is a side effect of the Liskov substutution principle. Object-oriented languages have those implicit casts according to Liskov's principle, LLVM IR is an assembly language and doesn't, and you're writing a tool that translates code from one kind of language to the other so you have to add explicit casts.

Write functions like isAssignableFrom() and cast() and call them when necessary, and you'll find that it works quite naturally.

The same thing happens for primitive types, BTW. a=b requires you to check that a's type is assignable from b's and to add a cast, and the kind of cast depends on your language's type system. (In some languages, casting 42 to boolean produces true, in others false, and in some languages it leads to an error.) Your implementation of cast() may be quite complex by the time you've implemented all of your type rules.

answered May 18, 2021 at 9:48

arnt

9,8285 gold badges27 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Desperados Over a year ago

So for example for languages that have no type checking whatsoever, like JS or Python every single assignment is cast, every single argument passed to a function is cast? Do thy do some kind of type inference or do they simply treat everything as bytes for example. In my case, I do static type checking and I fixed the rules so that what you said makes sense. But I am just wondering ...

sepp2k Over a year ago

@Desperados In dynamically typed languages, all values usually have the same type (usually a pointer to some object struct that includes dynamic type information). At least in ahead-of-time compilers (which are very rare for dynamically typed languages) and interpreters. In JIT compilers, you usually know the concrete types of things, so you can generate code for the specific types.

mystery Over a year ago

It wouldn't be accurate to say JS or Python do not have type checking. They do have type checking, but it happens at runtime rather than at compile-time (except where there is a JIT). In practice this means that an ahead-of-time compiler to native code for these languages must insert type checks everywhere it is unable to prove they are not needed.

arnt Over a year ago

JS and python do have some type checking, and the compile-time cast() for such a language tends to emit complicated runtime code quite often. Although not always — even a simple a python compiler using LLVM would probably handle some casts at compile time, such as a = "1234" + 567, where both operands to + are known at compile time, so + can folded to a constant. This is different from a language such as java, where cast() emits complicated code less often, but also similar, because a java compiler too will emit a complicated runtime check sometimes.

Collectives™ on Stack Overflow

How to implement Object Oriented Dynamic Dispatch in LLVM?

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related