4

I am just in the beginning of my graduation project that is supposed to last for 6 months. The goal of the project is to implement a .Net-compiler for one scripting language. I had the Compiler Construction as a subject in my curriculum and am aware of the basic steps how to implement a compiler in general, but we used Bison and simple compiler with GCC as back-end and thus I don't know much about implementing compilers on .Net platform.

Having carried out some research on this topic I found the following alternative solutions for code generation (I am not talking about other essential parts of compiler, like a parser -- it is out of scope here):

  1. Direct code generation using Reflection.Emit.
  2. Using Common Compiler Interface abstraction over Reflection.Emit for automation of some code generation.
  3. Using CodeDOM for C# and VB compilation at runtime.
  4. There is a new emerging C# "compiler as a service" called Roslyn, available as a CTP now.
  5. DLR offers support for dynamic code generation and has some interfaces for runtime code generation via expression trees etc.
  6. Mono is shipped with Mono.Cecil library that seems to have some functionality for code generation as well.

The primary goal of my project is to delve deeper into the guts of .Net, to learn Compiler Construction and to get good grade for my work. The secondary goal is to come up with a compiler implementation that can be later opened to the community under a permissive open-source license.

So, what would be a most interesting, educative, entertaining and promising approach here? I would have definitely tried all of them if I had some more time, but I need to submit my work in 6 months sharp to get a positive grade...

Thank you in advance, Alexander.

7
  • Note that Roslyn is just a thick wrapper around your 1, 2, and 3. Commented Nov 9, 2011 at 22:53
  • @SLaks, I think Roslyn doesn't actually use CodeDOM (#3). Commented Nov 9, 2011 at 23:15
  • I wasn't sure about that one. I suspect you're right. Commented Nov 9, 2011 at 23:16
  • What kind of language would you prefer? Dynamic, static? Commented Nov 9, 2011 at 23:18
  • My personal insight is that there was only one thing that mattered. Write your own parser. Once you do, everything else is simple. The lexer isn't that complicated to being with, the code generator is trivial when the parser works. The worst thing you can do is leaving the parser up to a tool. Commented Nov 10, 2011 at 1:01

3 Answers 3

5

If you want the easier way and your language can be reasonably translated into C#, I would recommend you to generate C# code (or similar) and compile that. Roslyn would be probably best at that. Apparently, CCI can do that too using CCI Code, but I've never used that. I wouldn't recommend CodeDOM, because it doesn't support features like static classes or extension methods.

If you want more control or if you want to go low-level you can generate CIL directly using Reflection.Emit. But it will be (much) more work, especially if you're not familiar with CIL. I think Cecil can be used the same way, but it's intended for something else, and I don't think it offers any advantages over Reflection.Emit.

DLR is meant, as its full name suggests, for dynamic languages. The Expressions it uses can be used for code generation, but I think they are best at generating relatively simple methods at runtime. Of course, DLR itself can be very useful if your language is dynamic.

Sign up to request clarification or add additional context in comments.

Comments

2

Boo is a language/compiler that targets the CLI. It appears to be open source so you could study how they accomplish it.

1 Comment

Very interesting suggestion! I read the book "DSL with Boo" and even used Boo (as a scripting engine) in one of my working projects -- but i never treated it from the side of compiler construction. thank you!
2

Back when I was writing compilers, I would write to assembly language (i.e. assembly language source code) that I then ran through the system's assembler. That way I could easily see what I was generating. It's a whole lot easier to read mov ax, bx (x86 assembly) than it is to decode HEX opcodes.

If I wasn't allowed to use the assembler in the final product, I developed the compiler using the assembly output and then once I got everything working I made a binary output path. The beauty was, all I had to change was the actual bytes output (opcodes and binary values rather than text).

I would suggest doing something similar for your project. Develop it initially to output MSIL that you can assemble with ILASM. That way, you can easily verify your code generator's output by reading the generated code. Once you're confident that your code generator is working, add an output option that will use Reflection.Emit or the Common Compiler Infrastructure.

1 Comment

Interesting suggestion, thank you! MSIL-output is nice for debugging purposes and for compiler optimization. Anyway, I am thinking first of writing a translator to C# and then implement my own compiler, for compiler optimization is not at all easy and transparent.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.