0

I am trying to make a toy compiler for an object oriented language which supports dynamic dispatch and simple inheritance. Meaning any class Child that extends a parent can be used whenever a Parent class is declared. It inherits all its fields and its methods and it may override methods.

Up to now I have been considering implementing virtual function tables and making sure the memory layouts of Parent and Child classes are as similar as possible. Here is a small example in C to illustrate what I mean:

#include <stdlib.h>

typedef struct {
    struct ParentVTable *_vtable;
    int inheritedField;
} Parent;

struct ParentVTable {
    void (*inheritedMethod)(Parent *, int);
};

void Parent_inheritedMethod(Parent *self, int b) {
    b = 0;
}

struct ParentVTable ParentVTable_inst = {
    .inheritedMethod = &Parent_inheritedMethod
};

void Parent_init(Parent *self) {
    self->_vtable = &ParentVTable_inst;
    self->inheritedField = 42;
}

Parent *Parent_new(void) {
    Parent *self = (Parent*)malloc(sizeof(Parent));
    Parent_init(self);
    return self;
}

typedef struct {
    struct ChildVTable *_vtable;
    int inheritedField;
    int newField;
} Child;

struct ChildVTable {
    void (*inheritedMethod)(Child *, int);
    int (*newMethod)(Child *, int);
};

int Child_newMethod(Child *self, int i) {
    return i + self->inheritedField;
}

struct ChildVTable ChildVTable_inst = {
    .inheritedMethod = (void (*)(Child *, int)) Parent_inheritedMethod,
    .newMethod = &Child_newMethod
};

void Child_init(Child *self) {
    Parent_init((Parent *) self);
    self->_vtable = &ChildVTable_inst;
    self->newField = 0;
}

Child *Child_new(void) {
    Child *self = (Child*)malloc(sizeof(Child));
    Child_init(self);
    return self;
}

int main() {
    Parent *p = (Parent*) Child_new();
    return 0;
}

As you can see in the main method, a Child class can be used as a Parent class provided that it is cast as a Parent. Using clang this cast translates in LLVM into a bitcast operation without which the llvm is not valid (the llvm code does not compile). Here is the corresponding piece of code:

define i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca %struct.Parent*, align 8
  store i32 0, i32* %1, align 4
  %3 = call %struct.Child* @Child_new()
  %4 = bitcast %struct.Child* %3 to %struct.Parent*
  store %struct.Parent* %4, %struct.Parent** %2, align 8
  ret i32 0
}

Up to there everything is fine. The thing is that in the language I am trying to make, as well as in most object oriented languages that implement inheritance, this cast that is done in C is implicit. I have no way to see, at compile time, that a cast is required. My question, thus, is how do I know when I have to bitcast an llvm variable for example before assigning it or before passing it as an argument to call etc. Assuming this is only known at runtime. Am I missing something, should I simply bitcast every time to make sure the correct 'objects' are passed around in the code? Any help would be appreciated!

1 Answer 1

2

What you're seeing is a side effect of the Liskov substutution principle. Object-oriented languages have those implicit casts according to Liskov's principle, LLVM IR is an assembly language and doesn't, and you're writing a tool that translates code from one kind of language to the other so you have to add explicit casts.

Write functions like isAssignableFrom() and cast() and call them when necessary, and you'll find that it works quite naturally.

The same thing happens for primitive types, BTW. a=b requires you to check that a's type is assignable from b's and to add a cast, and the kind of cast depends on your language's type system. (In some languages, casting 42 to boolean produces true, in others false, and in some languages it leads to an error.) Your implementation of cast() may be quite complex by the time you've implemented all of your type rules.

Sign up to request clarification or add additional context in comments.

4 Comments

So for example for languages that have no type checking whatsoever, like JS or Python every single assignment is cast, every single argument passed to a function is cast? Do thy do some kind of type inference or do they simply treat everything as bytes for example. In my case, I do static type checking and I fixed the rules so that what you said makes sense. But I am just wondering ...
@Desperados In dynamically typed languages, all values usually have the same type (usually a pointer to some object struct that includes dynamic type information). At least in ahead-of-time compilers (which are very rare for dynamically typed languages) and interpreters. In JIT compilers, you usually know the concrete types of things, so you can generate code for the specific types.
It wouldn't be accurate to say JS or Python do not have type checking. They do have type checking, but it happens at runtime rather than at compile-time (except where there is a JIT). In practice this means that an ahead-of-time compiler to native code for these languages must insert type checks everywhere it is unable to prove they are not needed.
JS and python do have some type checking, and the compile-time cast() for such a language tends to emit complicated runtime code quite often. Although not always — even a simple a python compiler using LLVM would probably handle some casts at compile time, such as a = "1234" + 567, where both operands to + are known at compile time, so + can folded to a constant. This is different from a language such as java, where cast() emits complicated code less often, but also similar, because a java compiler too will emit a complicated runtime check sometimes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.