A ghostly P shape

I really like MkDocs as a documentation tool. It’s based on Markdown, looks great (especially with the popular Material for MkDocs theme), has a great range of plug-ins, and can automatically rebuild and serve changes during local development.

Recently, I ran into a curious problem — here’s what happened, how I recreated it, and how you can fix it.

Ghostly goings-on

Background

Pages were mysteriously getting empty <p></p> HTML elements inserted, and I couldn’t figure out why.

The set-up

Let’s go step by step through how to reproduce the problem I was seeing by building an empty MkDocs project from scratch in a Docker container.

docker run -it --rm -p 8000:8000 python:latest bash

# pip install mkdocs

...

cd home
mkdocs new spooky
cd spooky
mkdocs serve --dev-addr 0.0.0.0:8000 > /dev/null 2>&1 &

Next, browse to http://127.0.0.1:8000/ on the host machine. You should see a default page being served from inside the Docker container.

MkDocs default page

Reproducing the problem

Now, let’s replace the default content with a setup that demonstrates the issue with “ghostly” empty elements. We want a paragraph of text, followed by an HTML table where the content should be treated as Markdown. Although slightly contrived, this example is based on a real problem seen in the wild.

While Markdown’s own table syntax is simpler for basic tables, using HTML tables offers greater flexibility, especially for complex layouts. So that is what we will be using, with the md_in_html extension to allow markdown content inside the tables. Let’s enable it.

Press Ctrl+C to kill the mkdocs process, then update the MkDocs settings like this:

echo -e "\nmarkdown_extensions:\n  - md_in_html" >> mkdocs.yml

Next, we want to install an editor to make it easy to modify the Markdown source file.

apt update

apt install vim
vim docs/index.html

Change the content as follows. Reload the web page and note how the corresponding HTML has been generated.

# Example

Paragraph before.

<div id="outer" markdown="1"><div id="inner" markdown="1">Some **bold** text.</div></div>

Paragraph after.
<h1 id="example">Example</h1>
<p>Paragraph before.</p>
<div id="outer">
<p><div id="inner" markdown="1">Some <strong>bold</strong> text.</p>  ⮄ paragraph AROUND div 😈
</div>
</div>
<p>Paragraph after.</p></div>

Note the unexpected <p> and </p> tags.

However, notice what happens if we add some line breaks. The same Markdown content now produces different HTML output:

# Example

Paragraph before.

<div id="outer" markdown="1">
<div id="inner" markdown="1">Some **bold** text.</div>
</div>

Paragraph after.
<h1 id="example">Example</h1>
<p>Paragraph before.</p>
<div id="outer">
<div id="inner">
<p>Some <strong>bold</strong> text.</p>  ⮄ paragraph INSIDE div 😇
</div>
</div>
<p>Paragraph after.</p></div>

Now the content of the inner div contains a paragraph of text, but there are no <p></p> tags wrapping the div, which is reasonable behaviour.

So, to clarify, the original looked like <p><div>...</div></p> and, just by modifying the line breaks in the Markdown, the HTML output changed to <div><p>...</p></div>.

You can take it a step further by specifying the inner div as a span element (i.e. inline rather than block-level) like this:

# Example

Paragraph before.

<div id="outer" markdown="1">
<div id="inner" markdown="span">  ⮄ note the span
Some **bold** text.
</div>
</div>

Paragraph after.
<h1 id="example">Example</h1>
<p>Paragraph before.</p>
<div id="outer">
<div id="inner">
Some <strong>bold</strong> text. ⮄ no paragraph tags at all 😇
</div>
</div>
<p>Paragraph after.</p></div>

Does it matter?

What prompted this post was a “bug” where empty <p></p> appeared before a table with markdown="1". These ghost paragraphs seemed to come out of nowhere. Even though my MkDocs theme was suppressing display of these empty paragraphs, they were still affecting the CSS styling when printing because my heading had “keep with next” styling: these empty paragraphs became the next elements, so headings could still appear at the bottom of printed pages.

The examples in this post are a simplified version (I had difficulty reproducing the original issue fully). This issue took a lot of debugging and head-scratching to resolve, and I found little help online. Hopefully, this post will save someone else from the same frustration.

So, what’s going on?

The CommonMark spec attempts to unambiguously define the syntax of Markdown. The section about HTML blocks goes into a lot of detail about HTML in Markdown, including many examples.

There are several layers of things happening here:

  1. Processing of HTML blocks in Markdown content is part of standard Markdown processing. At this level, any Markdown syntax inside the HTML content will be treated literally as HTML content.
    • The HTML content will just be identified and then passed through to the rendered page unchanged. It is not parsed, and need not be valid.
    • Blank lines will be treated as the end of an HTML block, and cause the Markdown processor to switch back to rendering Markdown (with unexpected consequences if you have blank lines inside the HTML block).
  2. The second layer involves processing the Markdown content inside HTML blocks inside Markdown. It is triggered by the markdown="..." attribute, which is supported by some markdown processors. In the case of MkDocs, this is provided by python-markdown’s md_in_html extension (which we explicitly enabled in the MkDocs settings above).
    • Note that using this extension will change the HTML tag processing slightly. Since the extension needs to parse the content anyway to differentiate between HTML and Markdown content, it treats the HTML less literally and will do things like automatically closing dangling HTML tags.

So, the spooky paragraphs are appearing for elements that are being erroneously interpreted as block-level elements because of the intricacies of the Markdown parsing. But, as demonstrated by these examples, you can control this with careful formatting.

Summary

When working with HTML blocks in MkDocs Markdown, follow these guidelines to avoid unexpected issues:

  • Always include a blank line before and after each HTML block.
  • Avoid blank lines inside the HTML blocks.
  • Start each block-level HTML element (start and end tag) on a new line.
  • Avoid indenting any of the lines of the HTML block at all.

Additionally, when you want to render content as markdown inside those HTML blocks:

  • Mark the HTML element that wraps the markdown content with markdown="1". Work outwards from there to each parent HTML element, adding the markdown attribute there each time until you reach the outermost HTML element.
  • If you get unexpected block-level elements being created, this is probably because something (probably the HTML element immediately wrapping the markdown content) is being interpreted as a block-level element. To fix this, change its markdown="1" attribute to markdown="span" to ensure it’s treated as inline content.