Ghostly paragraphs in MkDocs
I really like MkDocs as a documentation tool. It’s based on Markdown, looks great (especially with the popular Material for MkDocs theme), has a great range of plug-ins, and can automatically rebuild and serve changes during local development.
Recently, I ran into a curious problem — here’s what happened, how I recreated it, and how you can fix it.
Ghostly goings-on
Background
Pages were mysteriously getting empty <p></p> HTML elements inserted, and I couldn’t figure out why.
The set-up
Let’s go step by step through how to reproduce the problem I was seeing by building an empty MkDocs project from scratch in a Docker container.
docker run -it --rm -p 8000:8000 python:latest bash
# pip install mkdocs
...
cd home
mkdocs new spooky
cd spooky
mkdocs serve --dev-addr 0.0.0.0:8000 > /dev/null 2>&1 &
Next, browse to http://127.0.0.1:8000/ on the host machine. You should see a default page being served from inside the Docker container.
Reproducing the problem
Now, let’s replace the default content with a setup that demonstrates the issue with “ghostly” empty elements. We want a paragraph of text, followed by an HTML table where the content should be treated as Markdown. Although slightly contrived, this example is based on a real problem seen in the wild.
While Markdown’s own table syntax is simpler for basic tables, using HTML tables offers greater flexibility, especially for complex layouts. So that is what we will be using, with the md_in_html extension to allow markdown content inside the tables. Let’s enable it.
Press Ctrl+C to kill the mkdocs process, then update the MkDocs settings like this:
echo -e "\nmarkdown_extensions:\n - md_in_html" >> mkdocs.yml
Next, we want to install an editor to make it easy to modify the Markdown source file.
apt update
apt install vim
vim docs/index.html
Change the content as follows. Reload the web page and note how the corresponding HTML has been generated.
# Example
Paragraph before.
<div id="outer" markdown="1"><div id="inner" markdown="1">Some **bold** text.</div></div>
Paragraph after.
<h1 id="example">Example</h1>
<p>Paragraph before.</p>
<div id="outer">
<p><div id="inner" markdown="1">Some <strong>bold</strong> text.</p> ⮄ paragraph AROUND div 😈
</div>
</div>
<p>Paragraph after.</p></div>
Note the unexpected <p>
and </p>
tags.
However, notice what happens if we add some line breaks. The same Markdown content now produces different HTML output:
# Example
Paragraph before.
<div id="outer" markdown="1">
<div id="inner" markdown="1">Some **bold** text.</div>
</div>
Paragraph after.
<h1 id="example">Example</h1>
<p>Paragraph before.</p>
<div id="outer">
<div id="inner">
<p>Some <strong>bold</strong> text.</p> ⮄ paragraph INSIDE div 😇
</div>
</div>
<p>Paragraph after.</p></div>
Now the content of the inner div
contains a paragraph of text, but there are
no <p></p>
tags wrapping the div
, which is reasonable behaviour.
So, to clarify, the original looked like <p><div>...</div></p>
and, just
by modifying the line breaks in the Markdown, the HTML output changed to
<div><p>...</p></div>
.
You can take it a step further by specifying the inner div
as a span
element (i.e. inline rather than block-level) like this:
# Example
Paragraph before.
<div id="outer" markdown="1">
<div id="inner" markdown="span"> ⮄ note the span
Some **bold** text.
</div>
</div>
Paragraph after.
<h1 id="example">Example</h1>
<p>Paragraph before.</p>
<div id="outer">
<div id="inner">
Some <strong>bold</strong> text. ⮄ no paragraph tags at all 😇
</div>
</div>
<p>Paragraph after.</p></div>
Does it matter?
What prompted this post was a “bug” where empty <p></p>
appeared before
a table with markdown="1"
. These ghost paragraphs seemed to come out of
nowhere. Even though my MkDocs theme was suppressing display of these empty
paragraphs, they were still affecting the CSS styling when printing because
my heading had “keep with next” styling: these empty paragraphs became the
next elements, so headings could still appear at the bottom of printed pages.
The examples in this post are a simplified version (I had difficulty reproducing the original issue fully). This issue took a lot of debugging and head-scratching to resolve, and I found little help online. Hopefully, this post will save someone else from the same frustration.
So, what’s going on?
The CommonMark spec attempts to unambiguously define the syntax of Markdown. The section about HTML blocks goes into a lot of detail about HTML in Markdown, including many examples.
There are several layers of things happening here:
- Processing of HTML blocks in Markdown content is part of standard Markdown
processing. At this level, any Markdown syntax inside the
HTML content will be treated literally as HTML content.
- The HTML content will just be identified and then passed through to the rendered page unchanged. It is not parsed, and need not be valid.
- Blank lines will be treated as the end of an HTML block, and cause the Markdown processor to switch back to rendering Markdown (with unexpected consequences if you have blank lines inside the HTML block).
- The second layer involves processing the Markdown content inside
HTML blocks inside Markdown. It is triggered by the
markdown="..."
attribute, which is supported by some markdown processors. In the case of MkDocs, this is provided bypython-markdown
’s md_in_html extension (which we explicitly enabled in the MkDocs settings above).- Note that using this extension will change the HTML tag processing slightly. Since the extension needs to parse the content anyway to differentiate between HTML and Markdown content, it treats the HTML less literally and will do things like automatically closing dangling HTML tags.
So, the spooky paragraphs are appearing for elements that are being erroneously interpreted as block-level elements because of the intricacies of the Markdown parsing. But, as demonstrated by these examples, you can control this with careful formatting.
Summary
When working with HTML blocks in MkDocs Markdown, follow these guidelines to avoid unexpected issues:
- Always include a blank line before and after each HTML block.
- Avoid blank lines inside the HTML blocks.
- Start each block-level HTML element (start and end tag) on a new line.
- Avoid indenting any of the lines of the HTML block at all.
Additionally, when you want to render content as markdown inside those HTML blocks:
- Mark the HTML element that wraps the markdown content with
markdown="1"
. Work outwards from there to each parent HTML element, adding themarkdown
attribute there each time until you reach the outermost HTML element. - If you get unexpected block-level elements being created, this is probably
because something (probably the HTML element immediately wrapping the markdown
content) is being interpreted as a block-level element. To fix this, change
its
markdown="1"
attribute tomarkdown="span"
to ensure it’s treated as inline content.