In-memory C# compilation (and .dll generation) using Roslyn
Posted on Wednesday, 28th December 2016
Recently I’ve been hard at work on my first Visual Studio Code extension and one of the requirements is to extract IL from a .dll binary. This introduces a question though, do I build the solution (blocking the user whilst their project is building), read the .dll from disk then extract the IL, or do I compile the project in memory behind the scenes, then stream the assembly to Roslyn? Ultimately I went with the later approach and was pleasantly surprised at how easy Roslyn makes this - surprised enough that I thought it deserved its own blog post.
Before we continue, let me take a moment to explain what Roslyn is for those that may not fully understand what it is.
What is Roslyn?
Roslyn is an open-source C# and VB compiler as a service platform.
The key words to take away with you here are “compiler as a service”; let me explain.
Traditionally compilers have been a black box of secrets that are hard to extend or harness, especially for any tooling or code analysis purposes. Take ReSharper for instance; Resharper has a lot of code analysis running under the bonnet that allows it to offer refactoring advice. In order for the ReSharper team to provide this they had to build their own analysis tools that would manually parse your solution’s C# inline with the .NET runtime - the .NET platform provided no assistance with this, essentially meaning they had to duplicate a lot of the work the compiler was doing.
This has since changed with the introduction of Roslyn. For the past couple of years Microsoft have been rewriting the C# compiler in C# (I know, it’s like a compiler Inception right?) and opening it up via a whole host of APIs that are easy to prod, poke and interrogate. This opening up of the C# compiler has resulted in a whole array of code analysis tooling such as better StyleCop integration and debugging tools like OzCode and the like. What’s more, you can also harness Roslyn for other purposes such as tests that fail as soon as common code smells are introduced into a project.
Let’s start
So now we all know what Roslyn is, let’s take a look at how we can use it to compile a project in memory. In this post we will be taking some C# code written in plain text, turning it into a syntax tree that the compiler can understand then using Roslyn to compile it, resulting in a streaming in-memory assembly.
Create our project
In this instance I’m using .NET Core on a Mac but this will also work on Windows, so let’s begin by creating a new console application by using the .NET Core CLI.
dotnet new -t console
Now, add the following dependencies to your project.json
file:
"dependencies": {
"Microsoft.CodeAnalysis.CSharp.Workspaces": "1.3.2",
"Mono.Cecil": "0.10.0-beta1-v2",
"System.ValueTuple": "4.3.0-preview1-24530-04"
},
For those interested, here is a copy of the project.json
file in its entirety:
{
"version": "1.0.0-*",
"buildOptions": {
"debugType": "portable",
"emitEntryPoint": true
},
"dependencies": {
"Microsoft.CodeAnalysis.CSharp.Workspaces": "1.3.2",
"Mono.Cecil": "0.10.0-beta1-v2",
"System.ValueTuple": "4.3.0-preview1-24530-04"
},
"frameworks": {
"netcoreapp1.1": {
"dependencies": {
"Microsoft.NETCore.App": {
"type": "platform",
"version": "1.0.1"
}
},
"imports": "portable-net45+win8+wp8+wpa81"
}
}
}
Once we’ve restored our project using the dotnet restore
command, the next step is to write a simple class to represent our source code. This code could be read from a web form, a database or a file on disk. In this instance I’m hard-coding it into the application itself for simplicity.
public class Program {
public static void Main(string[] args)
{
var code = @"
using System;
public class ExampleClass {
private readonly string _message;
public ExampleClass()
{
_message = ""Hello World"";
}
public string getMessage()
{
return _message;
}
}";
CreateAssemblyDefinition(code);
}
public static void CreateAssemblyDefinition(string code)
{
var sourceLanguage = new CSharpLanguage();
SyntaxTree syntaxTree = sourceLanguage.ParseText(code, SourceCodeKind.Regular);
...
}
}
Getting stuck into Roslyn
Now we’ve got the base of our project sorted, let’s dive into some of the Roslyn API.
First we’re going to want to create an interface we’ll use to define the language we want to use. In this instance it’ll be C#, but Roslyn also supports VB.
public interface ILanguageService
{
SyntaxTree ParseText(string code, SourceCodeKind kind);
Compilation CreateLibraryCompilation(string assemblyName, bool enableOptimisations);
}
Next we’re going to need to parse our plain text C#, so we’ll begin by working on the implementation of the ParseText
method.
public class CSharpLanguage : ILanguageService
{
private static readonly LanguageVersion MaxLanguageVersion = Enum
.GetValues(typeof(LanguageVersion))
.Cast<LanguageVersion>()
.Max();
public SyntaxTree ParseText(string sourceCode, SourceCodeKind kind) {
var options = new CSharpParseOptions(kind: kind, languageVersion: MaxLanguageVersion);
// Return a syntax tree of our source code
return CSharpSyntaxTree.ParseText(sourceCode, options);
}
public Compilation CreateLibraryCompilation(string assemblyName, bool enableOptimisations) {
throw new NotImplementedException();
}
}
As you’ll see the implementation is rather straight forward and simply involved us setting a few parse options such as the language features we expect to see being parsed (marked via the languageVersion
parameter) along with the SourceCodeKind
enum.
Looking further into Roslyn’s SyntaxTree
At this point I feel it’s worth mentioning that if you’re interested in learning more about Roslyn then I would recommend spending a bit of time looking into Roslyn’s Syntax Tree API. Josh Varty’s posts on this subject are a great resource I would recommend.
I would also recommend taking a look at LINQ Pad, which amongst other great features, has the ability to show you a syntax tree generated by Roslyn your code. For instance, here is a generated syntax tree of our ExampleClass
code we’re using in this post:
Now our C# has been parsed and turned into a data structure the C# compiler can understand, let’s look at using Roslyn to compile it.
Compiling our Syntax Tree
Continuing with the CreateAssemblyDefinition
method, let’s compile our syntax tree:
public static void CreateAssemblyDefinition(string code)
{
var sourceLanguage = new CSharpLanguage();
SyntaxTree syntaxTree = sourceLanguage.ParseText(code, SourceCodeKind.Regular);
Compilation compilation = sourceLanguage
.CreateLibraryCompilation(assemblyName: "InMemoryAssembly", enableOptimisations: false)
.AddReferences(_references)
.AddSyntaxTrees(syntaxTree);
...
}
At this point we’re going to want to fill in the implementation of our CreateLibraryCompilation
method within our CSharpLanguage
class. We’ll start this by passing the appropriate arguments into an instance of CSharpCompilationOptions
. This includes:
outputKind
- We’re outputting a (dll) Dynamically Linked LibraryoptimizationLevel
- Whether we want our C# output to be optimisedallowUnsafe
- Whether we want the our C# code to allow the use of unsafe code or not
public class CSharpLanguage : ILanguageService
{
private readonly IReadOnlyCollection<MetadataReference> _references = new[] {
MetadataReference.CreateFromFile(typeof(Binder).GetTypeInfo().Assembly.Location),
MetadataReference.CreateFromFile(typeof(ValueTuple<>).GetTypeInfo().Assembly.Location)
};
...
public Compilation CreateLibraryCompilation(string assemblyName, bool enableOptimisations) {
var options = new CSharpCompilationOptions(
OutputKind.DynamicallyLinkedLibrary,
optimizationLevel: enableOptimisations ? OptimizationLevel.Release : OptimizationLevel.Debug,
allowUnsafe: true);
return CSharpCompilation.Create(assemblyName, options: options, references: _references);
}
}
Now we’ve specified our compiler options, we invoke the Create
factory method where we also need to specify the assembly name we want our in-memory assembly to be called (InMemoryAssembly
in our case, passed in when calling our CreateLibraryCompilation
method), along with additional references required to compile our source code. In this instance, as we’re targeting C# 7, we need to supply the compilation unit with the ValueTuple structs implementation. If we were targeting an older version of C# then this would not be required.
All that’s left to do now is to call Roslyn’s emit(Stream stream)
method that takes a Stream
input parameter and we’re sorted!
public static void CreateAssemblyDefinition(string code)
{
...
Compilation compilation = sourceLanguage
.CreateLibraryCompilation(assemblyName: "InMemoryAssembly", enableOptimisations: false)
.AddReferences(_references)
.AddSyntaxTrees(syntaxTree);
var stream = new MemoryStream();
var emitResult = compilation.Emit(stream);
if (emitResult.Success){
stream.Seek(0, SeekOrigin.Begin);
AssemblyDefinition assembly = AssemblyDefinition.ReadAssembly(stream);
}
}
From here I’m then able to pass my AssemblyDefinition to a method that extracts the IL and I’m good to go!
Conclusion
Whilst this post is quite narrow in its focus (I can’t imagine everyone is looking to compile C# in memory!), hopefully it’s served as a primer in spiking your interest in Roslyn and what it’s capable of doing. Roslyn is a truly powerful platform that I wish more languages offered. As mentioned before there are some great resources available go into much more depth. I would especially recommend Josh Varty’s posts on the subject.
Enjoy this post? Don't be a stranger!
Follow me on Twitter at @_josephwoodward and say Hi! I love to learn in the open, meet others in the community and talk Go, software engineering and distributed systems related topics.