In-memory C# compilation (and .dll generation) using Roslyn

Posted on Wednesday, 28 Dec 2016

Recently I've been hard at work on my first Visual Studio Code extension and one of the requirements is to extract IL from a .dll binary. This introduces a question though, do I build the solution (blocking the user whilst their project is building), read the .dll from disk then extract the IL, or do I compile the project in memory behind the scenes, then stream the assembly to Roslyn? Ultimately I went with the later approach and was pleasantly surprised at how easy Roslyn makes this - surprised enough that I thought it deserved its own blog post.

Before we continue, let me take a moment to explain what Roslyn is for those that may not fully understand what it is.

What is Roslyn?

Roslyn is an open-source C# and VB compiler as a service platform.

The key words to take away with you here are "compiler as a service"; let me explain.

Traditionally compilers have been a black box of secrets that are hard to extend or harness, especially for any tooling or code analysis purposes. Take ReSharper for instance; Resharper has a lot of code analysis running under the bonnet that allows it to offer refactoring advice. In order for the ReSharper team to provide this they had to build their own analysis tools that would manually parse your solution's C# inline with the .NET runtime - the .NET platform provided no assistance with this, essentially meaning they had to duplicate a lot of the work the compiler was doing.

This has since changed with the introduction of Roslyn. For the past couple of years Microsoft have been rewriting the C# compiler in C# (I know, it's like a compiler Inception right?) and opening it up via a whole host of APIs that are easy to prod, poke and interrogate. This opening up of the C# compiler has resulted in a whole array of code analysis tooling such as better StyleCop integration and debugging tools like OzCode and the like. What's more, you can also harness Roslyn for other purposes such as tests that fail as soon as common code smells are introduced into a project.

Let's start

So now we all know what Roslyn is, let's take a look at how we can use it to compile a project in memory. In this post we will be taking some C# code written in plain text, turning it into a syntax tree that the compiler can understand then using Roslyn to compile it, resulting in a streaming in-memory assembly.

Create our project

In this instance I'm using .NET Core on a Mac but this will also work on Windows, so let's begin by creating a new console application by using the .NET Core CLI.

dotnet new -t console

Now, add the following dependencies to your project.json file:

"dependencies": {
    "Microsoft.CodeAnalysis.CSharp.Workspaces": "1.3.2",
    "Mono.Cecil": "0.10.0-beta1-v2",
    "System.ValueTuple": "4.3.0-preview1-24530-04"
},

For those interested, here is a copy of the project.json file in its entirety:

{
  "version": "1.0.0-*",
  "buildOptions": {
    "debugType": "portable",
    "emitEntryPoint": true
  },
  "dependencies": {
    "Microsoft.CodeAnalysis.CSharp.Workspaces": "1.3.2",
    "Mono.Cecil": "0.10.0-beta1-v2",
    "System.ValueTuple": "4.3.0-preview1-24530-04"
  },
  "frameworks": {
    "netcoreapp1.1": {
      "dependencies": {
        "Microsoft.NETCore.App": {
          "type": "platform",
          "version": "1.0.1"
        }
      },
      "imports": "portable-net45+win8+wp8+wpa81"
    }
  }
}

Once we've restored our project using the dotnet restore command, the next step is to write a simple class to represent our source code. This code could be read from a web form, a database or a file on disk. In this instance I'm hard-coding it into the application itself for simplicity.

public class Program {

    public static void Main(string[] args)
    {

        var code = @"
        using System;
        public class ExampleClass {
            
            private readonly string _message;

            public ExampleClass()
            {
                _message = ""Hello World"";
            }

            public string getMessage()
            {
                return _message;
            }

        }";

        CreateAssemblyDefinition(code);
    }

    public static void CreateAssemblyDefinition(string code)
    {
        var sourceLanguage = new CSharpLanguage();
        SyntaxTree syntaxTree = sourceLanguage.ParseText(code, SourceCodeKind.Regular);

        ...
    }

}

Getting stuck into Roslyn

Now we've got the base of our project sorted, let's dive into some of the Roslyn API.

First we're going to want to create an interface we'll use to define the language we want to use. In this instance it'll be C#, but Roslyn also supports VB.

public interface ILanguageService
{
    SyntaxTree ParseText(string code, SourceCodeKind kind);

    Compilation CreateLibraryCompilation(string assemblyName, bool enableOptimisations);
}

Next we're going to need to parse our plain text C#, so we'll begin by working on the implementation of the ParseText method.

public class CSharpLanguage : ILanguageService
{
    private static readonly LanguageVersion MaxLanguageVersion = Enum
        .GetValues(typeof(LanguageVersion))
        .Cast<LanguageVersion>()
        .Max();

    public SyntaxTree ParseText(string sourceCode, SourceCodeKind kind) {
        var options = new CSharpParseOptions(kind: kind, languageVersion: MaxLanguageVersion);

        // Return a syntax tree of our source code
        return CSharpSyntaxTree.ParseText(sourceCode, options);
    }

    public Compilation CreateLibraryCompilation(string assemblyName, bool enableOptimisations) {
        throw new NotImplementedException();
    }
}

As you'll see the implementation is rather straight forward and simply involved us setting a few parse options such as the language features we expect to see being parsed (marked via the languageVersion parameter) along with the SourceCodeKind enum.

Looking further into Roslyn's SyntaxTree

At this point I feel it's worth mentioning that if you're interested in learning more about Roslyn then I would recommend spending a bit of time looking into Roslyn's Syntax Tree API. Josh Varty's posts on this subject are a great resource I would recommend.

I would also recommend taking a look at LINQ Pad, which amongst other great features, has the ability to show you a syntax tree generated by Roslyn your code. For instance, here is a generated syntax tree of our ExampleClass code we're using in this post:

http://assets.josephwoodward.co.uk/blog/linqpad_tree2.png

Now our C# has been parsed and turned into a data structure the C# compiler can understand, let's look at using Roslyn to compile it.

Compiling our Syntax Tree

Continuing with the CreateAssemblyDefinition method, let's compile our syntax tree:

public static void CreateAssemblyDefinition(string code)
{
    var sourceLanguage = new CSharpLanguage();
    SyntaxTree syntaxTree = sourceLanguage.ParseText(code, SourceCodeKind.Regular);

    Compilation compilation = sourceLanguage
      .CreateLibraryCompilation(assemblyName: "InMemoryAssembly", enableOptimisations: false)
      .AddReferences(_references)
      .AddSyntaxTrees(syntaxTree);

    ...
}

At this point we're going to want to fill in the implementation of our CreateLibraryCompilation method within our CSharpLanguage class. We'll start this by passing the appropriate arguments into an instance of CSharpCompilationOptions. This includes:

  • outputKind - We're outputting a (dll) Dynamically Linked Library
  • optimizationLevel - Whether we want our C# output to be optimised
  • allowUnsafe - Whether we want the our C# code to allow the use of unsafe code or not
public class CSharpLanguage : ILanguageService
{
    private readonly IReadOnlyCollection<MetadataReference> _references = new[] {
          MetadataReference.CreateFromFile(typeof(Binder).GetTypeInfo().Assembly.Location),
          MetadataReference.CreateFromFile(typeof(ValueTuple<>).GetTypeInfo().Assembly.Location)
      };

    ...

    public Compilation CreateLibraryCompilation(string assemblyName, bool enableOptimisations) {
      var options = new CSharpCompilationOptions(
          OutputKind.DynamicallyLinkedLibrary,
          optimizationLevel: enableOptimisations ? OptimizationLevel.Release : OptimizationLevel.Debug,
          allowUnsafe: true);

      return CSharpCompilation.Create(assemblyName, options: options, references: _references);
  }
}

Now we've specified our compiler options, we invoke the Create factory method where we also need to specify the assembly name we want our in-memory assembly to be called (InMemoryAssembly in our case, passed in when calling our CreateLibraryCompilation method), along with additional references required to compile our source code. In this instance, as we're targeting C# 7, we need to supply the compilation unit with the ValueTuple structs implementation. If we were targeting an older version of C# then this would not be required.

All that's left to do now is to call Roslyn's emit(Stream stream) method that takes a Stream input parameter and we're sorted!

public static void CreateAssemblyDefinition(string code)
{
    ...

    Compilation compilation = sourceLanguage
        .CreateLibraryCompilation(assemblyName: "InMemoryAssembly", enableOptimisations: false)
        .AddReferences(_references)
        .AddSyntaxTrees(syntaxTree);

    var stream = new MemoryStream();
    var emitResult = compilation.Emit(stream);
    
    if (emitResult.Success){
        stream.Seek(0, SeekOrigin.Begin);
        AssemblyDefinition assembly = AssemblyDefinition.ReadAssembly(stream);
    }
}

From here I'm then able to pass my AssemblyDefinition to a method that extracts the IL and I'm good to go!

Conclusion

Whilst this post is quite narrow in its focus (I can't imagine everyone is looking to compile C# in memory!), hopefully it's served as a primer in spiking your interest in Roslyn and what it's capable of doing. Roslyn is a truly powerful platform that I wish more languages offered. As mentioned before there are some great resources available go into much more depth. I would especially recommend Josh Varty's posts on the subject.

Back