Parsing Markdown in .NET

October 11, 2015

Up until recently, I've been using Jeff Atwood's MarkdownSharp to transform my Markdown blog posts into HTML. A single C# file without any dependencies, the component is trivial to integrate into almost any .NET application.

However, I wasn't entirely happy with MarkdownSharp. First of all, it's no longer being actively worked on as it has seen three (!) commits within the last year. More importantly, though, it doesn't support fenced code blocks, a feature I've come to like a lot.

Fenced Code Blocks #

Here's what a fenced code block looks like:

```
<div>
  <!-- ... -->
</div>
```

As you can see, there's no need to indent the lines of the HMTL code block by four spaces because they're clearly delimited by three backticks (```). A normal code block would've looked like this:

<div>
  <!-- ... -->
</div>

Not having to indent the code is nice, but that's not the most valuable aspect of fenced code blocks. Their biggest advantage is the possibility to specify the code language right after the opening backticks:

```html
<div>
  <!-- ... -->
</div>
```

That way, the rendered <code> tag receives the language-html CSS class, which can then be used by a JavaScript syntax highlighter like Prism to properly highlight the given code block.

Note that fenced code blocks are not part of John Gruber's original Markdown specification. Instead, they've been formalized as part of CommonMark, an effort to standardize dialects of Markdown used by GitHub, StackOverflow, and others.

CommonMark.NET #

Because of these reasons, I've replaced MarkdownSharp by CommonMark.NET, a .NET implementation of the CommonMark spec. I can now use fenced code blocks and all the other goodness that comes with the CommonMark dialect.

Additionally, the Markdown parsing is a lot faster. I've measured a 30x increase in parse time. However, take these benchmarks with a grain of salt: In the realms of web development where performance bottlenecks mainly stem from network latencies and database queries, shaving off a millisecond from the time it takes to parse a blog post usually doesn't result in big time savings.

Besides better parser performance, CommonMark.NET doesn't use recursion to parse Markdown files. This can be an important little detail if you're parsing Markdown text submitted by users: Maliciously crafted markup can cause a stack overflow due to deep recursion stacks, which will cause the entire process to shut down because a StackOverflowException cannot be caught in general.

Summary #

If you're looking for a .NET Markdown parser, I can recommend CommonMark.NET. Also, if you're using Sublime Text to write Markdown texts as well, make sure to check out my post on how to set up Sublime Text for a vastly better markdown writing experience!