Kramdown Jekyll
This is version 2.3.1 of the syntax documentation.
- Kramdown Jekyll Math
- Jekyll Kramdown Extension
- Jekyll Kramdown Options
- Jekyll Kramdown-parser-gfm
- Jekyll Kramdown Coderay
- Jekyll Generator
Jekyll new my-blog (to create a new folder structure) jekyll build (to process the folder structure, and generate an html site) jekyll serve (to run an http server on your local environment) However, as stated before, in order to run the jekyll executable on your environment, you’ll need to have a lot of dependencies in place. Jekyll in docker. In Jekyll, content for pages and posts can be written in either HTML or Markdown. Although there will be times that HTML is the more appropriate choice, Markdown provides a more natural writing. The more you know, the more you know you don't know.
The kramdown syntax is based on the Markdown syntax and has been enhanced with features that arefound in other Markdown implementations like Maruku, PHP Markdown Extra and Pandoc. However,it strives to provide a strict syntax with definite rules and therefore isn’t completely compatiblewith Markdown. Nonetheless, most Markdown documents should work fine when parsed with kramdown. Allplaces where the kramdown syntax differs from the Markdown syntax are highlighted.
The snippet is an Inline Attribute List (IAL), which is an element used by Kramdown to attach attributes to other elements. Jekyll, kramdown, markdown. Front matter is metadata included at the beginning of a Markdown document, preceding its content. This data can be used by static site generators such as Jekyll, Hugo, and many other applications. When you view a Markdown file rendered by GitLab, any front matter is displayed as-is, in a box at the top of the document.
Following is the complete syntax definition for all elements kramdown supports. Together with thedocumentation on the available converters, it is clearly specified what you will get when a kramdowndocument is converted.
Source Text Formatting
A kramdown document may be in any encoding, for example ASCII, UTF-8 or ISO-8859-1, and the outputwill have the same encoding as the source.
The document consists of two types of elements, block-level elements and span-level elements:
Block-level elements define the main structure of the content, for example, what part of the textshould be a paragraph, a list, a blockquote and so on.
Span-level elements mark up small text parts as, for example, emphasized text or a link.
Thus span-level elements can only occur inside block-level elements or other span-level elements.
You will often find references to the “first column” or “first character” of a line in a block-levelelement descriptions. Such a reference is always to be taken relative to the current indentationlevel because some block-level elements open up a new indentation level (e.g. blockquotes). Thebeginning of a kramdown document opens up the default indentation level which begins at the firstcolumn of the text.
Line Wrapping
Some lightweight markup syntax don’t work well in environments where lines are hard-wrapped. Forexample, this is the case with many email programs. Therefore kramdown allows content likeparagraphs or blockquotes to be hard-wrapped, i.e. broken across lines. This is sometimes referredto as “lazy syntax” since the indentation or line prefix required for the first line of content isnot required for the consecutive lines.
Block-level elements that support line wrapping always end when one of the following conditions ismet:
a blank line, an EOB marker line, a block IAL or theend of the document (i.e. a block boundary),
or an HTML block.
Line wrapping is allowed throughout a kramdown document but there are some block-level elements thatdo not support being hard-wrapped:
This is not an issue in most situations since headers normally fit on one line. If a header textgets too long for one line, you need to use HTML syntax instead.
The delimiting lines of a fenced code block do not support hard-wrapping. Since everything betweenthe delimiting lines is taken as is, the content of a fenced code block does also not supporthard-wrapping.
Each definition term has to appear on a separate line. Hard-wrapping would therefore introduceadditional definition terms. The definitions themselves, however, do support hard-wrapping.
Since each line of a kramdown table describes one table row or a separator, it is not possible tohard-wrap tables.
Note that it is NOT recommended to use lazy syntax to write a kramdown document. Theflexibility that the kramdown syntax offers due to the issue of line wrapping hinders readabilityand should therefore not be used.
Usage of Tabs
kramdown assumes that tab stops are set at multiples of four. This is especially important whenusing tabs for indentation in lists. Also, tabs may only be used at the beginning of a line whenindenting text and must not be preceded by spaces. Otherwise the results may be unexpected.
Automatic and Manual Escaping
Depending on the output format, there are often characters that need special treatment. For example,when converting a kramdown document to HTML one needs to take care of the characters <
, >
and&
. To ease working with these special characters, they are automatically and correctly escapeddepending on the output format.
This means, for example, that you can just use <
, >
and &
in a kramdown document and need notthink about when to use their HTML entity counterparts. However, if you do use HTML entities orHTML tags which use one of the characters, the result will be correct nonetheless!
Since kramdown also uses some characters to mark-up the text, there needs to be a way to escapethese special characters so that they can have their normal meaning. This can be done by usingbackslash escapes. For example, you can use a literal back tick like this:
Following is a list of all the characters (character sequences) that can be escaped:
Block Boundaries
Some block-level elements have to start and/or end on so called block boundaries, as stated in theirdocumentation. There are two cases where block boundaries come into play:
If a block-level element has to start on a block boundary, it has to be preceded by either ablank line, an EOB marker, a block IAL or it has tobe the first element.
If a block-level element has to end on a block boundary, it has to be followed by either a blankline, an EOB marker, a block IAL or it has to be thelast element.
All structural elements are block-level elements and they are used to structure the content. Theycan mark up some text as, for example, a simple paragraph, a quote or as a list item.
Blank lines
Any line that just contains white space characters such as spaces and tabs is considered a blankline by kramdown. One or more consecutive blank lines are handled as one empty blank line. Blanklines are used to separate block-level elements from each other and in this case they don’t havesemantic meaning. However, there are some cases where blank lines do have a semantic meaning:
- When used in headers – see the headers section
- When used in code blocks – see the code blocks section
- When used in lists – see the lists section
- When used in math blocks – see the math blocks section
- When used for elements that have to start/end on block boundaries
Paragraphs
Paragraphs are the most used block-level elements. One or more consecutive lines of text areinterpreted as one paragraph. The first line of a paragraph may be indented up to three spaces, theother lines can have any amount of indentation because paragraphs support linewrapping. In addition to the rules outlined in the section about line wrapping, aparagraph ends when a definition list line is encountered.
You can separate two consecutive paragraphs from each other by using one or more blank lines. Noticethat a line break in the source does not mean a line break in the output (due to the lazysyntax)!. If you want to have an explicit line break (i.e. a <br />
tag) you needto end a line with two or more spaces or two backslashes! Note, however, that a line break on thelast text line of a paragraph is not possible and will be ignored. Leading and trailing spaces willbe stripped from the paragraph text.
The following gives you an example of how paragraphs look like:
Headers
kramdown supports so called Setext style and atx style headers. Both forms can be used inside asingle document.
Setext Style
Setext style headers have to start on a block boundary with a line of text (theheader text) and a line with only equal signs (for a first level header) or dashes (for a secondlevel header). The header text may be indented up to three spaces but any leading or trailing spacesare stripped from the header text. The amount of equal signs or dashes is not significant, just oneis enough but more may look better. The equal signs or dashes have to begin at the first column. Forexample:
Since Setext headers start on block boundaries, this means in most situations that they have to bepreceded by a blank line. However, blank lines are not necessary after a Setext header:
However, it is generally a good idea to also use a blank line after a Setext header because it looksmore appropriate and eases reading of the document.
The original Markdown syntax allows one to omit the blank line before a Setext header. However,this leads to ambiguities and makes reading the document harder than necessary. Therefore it isnot allowed in a kramdown document.
An edge case worth mentioning is the following:
One might ask if this represents two paragraphs separated by a horizontal ruleor a second level header and a paragraph. As suggested by the wording in the example, the latter isthe case. The general rule is that Setext headers are processed before horizontal rules.
atx Style
atx style headers have to start on a block boundary with a line that containsone or more hash characters and then the header text. No spaces are allowed before the hashcharacters. The number of hash characters specifies the heading level: one hash character gives youa first level heading, two a second level heading and so on until the maximum of six hash charactersfor a sixth level heading. You may optionally use any number of hashes at the end of the line toclose the header. Any leading or trailing spaces are stripped from the header text. For example:
Again, the original Markdown syntax allows one to omit the blank line before an atx style header.
Specifying a Header ID
kramdown supports a nice way for explicitly setting the header ID which is taken from PHP MarkdownExtra and Maruku: If you follow the header text with an opening curly bracket (separated from thetext with a least one space), a hash, the ID and a closing curly bracket, the ID is set on theheader. If you use the trailing hash feature of atx style headers, the header ID has to go after thetrailing hashes. For example:
This additional syntax is not part of standard Markdown.
Blockquotes
A blockquote is started using the >
marker followed by an optional space and the content of theblockquote. The marker itself may be indented up to three spaces. All following lines, whether theyare started with the blockquote marker or just contain text, belong to the blockquote becauseblockquotes support line wrapping.
The contents of a blockquote are block-level elements. This means that if you are just using text ascontent that it will be wrapped in a paragraph. For example, the following gives you one blockquotewith two paragraphs in it:
Since the contents of a blockquote are block-level elements, you can nest blockquotes and use otherblock-level elements (this is also the reason why blockquotes need to support line wrapping):
Note that the first space character after the >
marker does not count when counting spaces forthe indentation of the block-level elements inside the blockquote! So code blockswill have to be indented with five spaces or one space and one tab, like this:
Line wrapping allows one to be lazy but hinders readability and should thereforebe avoided, especially with blockquotes. Here is an example of using blockquotes with line wrapping:
Code Blocks
Code blocks can be used to represent verbatim text like markup, HTML or a program fragment becauseno syntax is parsed within a code block.
Standard Code Blocks
A code block can be started by using four spaces or one tab and then the text of the code block. Allfollowing lines containing text, whether they adhere to this syntax or not, belong to the code blockbecause code blocks support line wrapping). A wrapped code line is automaticallyappended to the preceding code line by substituting the line break with a space character. Theindentation (four spaces or one tab) is stripped from each line of the code block.
The original Markdown syntax does not allow line wrapping in code blocks.
Note that consecutive code blocks that are only separate by blank lines are mergedtogether into one code block:
If you want to have one code block directly after another one, you need to use an EOBmarker to separate the two:
Fenced Code Blocks
This alternative syntax is not part of the original Markdown syntax. The idea and syntax comesfrom the PHP Markdown Extra package.
kramdown also supports an alternative syntax for code blocks which does not use indented blocks butdelimiting lines. The starting line needs to begin with three or more tilde characters (~
) and theclosing line needs to have at least the number of tildes the starting line has. Everything betweenis taken literally as with the other syntax but there is no need for indenting the text. Forexample:
If you need lines of tildes in such a code block, just start the code block with more tildes. Forexample:
This type of code block is especially useful for copy-pasted code since you don’t need to indent thecode.
Language of Code Blocks
You can tell kramdown the language of a code block by using an IAL:
The specially named class language-ruby
tells kramdown that this code block is written in the Rubylanguage. Such information can be used, for example, by converters to do syntax highlighting on thecode block.
Fenced code blocks provide an easier way to specify the language, namely by appending the languageof the code block to the end of the starting line:
Lists
kramdown provides syntax elements for creating ordered and unordered lists as well as definitionlists.
Ordered and Unordered lists
Both ordered and unordered lists follow the same rules.
A list is started with a list marker (in case of unordered lists one of +
, -
or *
– you canmix them – and in case of ordered lists a number followed by a period) followed by one tab or atleast one space, optionally followed by an IAL that should be applied tothe list item and then the first part of the content of the list item. The leading tabs or spacesare stripped away from this first line of content to allow for a nice alignment with the followingcontent of a list item (see below). All following list items with the same marker type (unordered orordered) are put into the same list. The numbers used for ordered lists are irrelevant, an orderedlist always starts at 1.
The following gives you an unordered list and an ordered list:
The original Markdown syntax allows the markers of ordered and unordered lists to be mixed, thefirst marker specifying the list type (ordered or unordered). This is not allowed in kramdown. Asstated, the above example will give you two lists (an unordered and an ordered) in kramdown andonly one unordered list in Markdown.
The first list marker in a list may be indented up to three spaces. The column number of the firstnon-space character which appears after the list item marker on the same line specifies theindentation that has to be used for the following lines of content of the list item. If there is nosuch character, the indentation that needs to be used is four spaces or one tab. Indented lines maybe followed by lines containing text with any amount of indentation due to linewrapping. Note, however, that in addition to the rules outlined in the sectionabout line wrapping, a list item also ends when a line with another list item marker is encountered– see the next paragraph.
The indentation is stripped from the content and the content (note that the content naturally alsocontains the content of the line with the item marker) is processed as text containing block-levelelements. All other list markers in the list may be indented up to three spaces or the number ofspaces used for the indentation of the last list item minus one, whichever number is smaller. Forexample:
So, while the above is possible and creates one list with three items, it is not advised to usedifferent (marker and list content) indents for same level list items as well as lazy indentation!It is much better to write such a list in the following way:
The original Markdown syntax also allows you to indent the marker, however, the behaviour of whathappens with the list items is not clearly specified and may surprise you.
Also, Markdown uses a fixed number of spaces/tabs to indent the lines that belong to a list item!
Unordered and ordered lists work the same way in regard to the indentation:
When using tabs for indenting the content of a list item, remember that tab stops occur at multiplesof four for kramdown. Tabs are correctly converted to spaces for calculating the indentation. Forexample:
It is clear that you might get unexpected results if you mix tabs and spaces or if you don’t havethe tab stops set to multiples of four in your editor! Therefore this should be avoided!
The content of a list item is made up of either text or block-level elements. Simple list items onlycontain text like in the above examples. They are not even wrapped in a paragraph tag. If the firstlist text is followed by one or more blank lines, it will be wrapped in a paragraph tag:
In the above example, the first list item text will be wrapped in a paragraph tag since it isfollowed by a blank line whereas the second list item contains just text. There is obviously aproblem for doing this with the last list item when it contains only text. You can circumvent thisby leaving a blank line after the last list item and using an EOB marker:
The text of the last list item is also wrapped in a paragraph tag if all other list items containa proper paragraph as first element. This makes the following use case work like expected, i.e.all the list items are wrapped in paragraphs:
The original Markdown syntax page specifies that list items which are separated by one or moreblank lines are wrapped in paragraph tags. This means that the first text will also be wrapped ina paragraph if you have block-level elements in a list which are separated by blank lines. Theabove rule is easy to remember and lets you exactly specify when the first list text should bewrapped in a paragraph. The idea for the above rule comes from the Pandoc package.
As seen in the examples above, blank lines between list items are allowed.
Since the content of a list item can contain block-level elements, you can do the following:
However, there is a problem when you want to have a code block immediately after a list item. Youcan use an EOB marker to circumvent this problem:
You can have any block-level element as first element in a list item. However, as described above,the leading tabs or spaces of the line with the list item marker are stripped away. This leads to aproblem when you want to have a code block as first element. The solution to this problem is thefollowing construct:
Note that the list marker needs to be followed with at least one space or tab! Otherwise the line isnot recognized as the start of a list item but interpreted as a paragraph containing the listmarker.
If you want to have one list directly after another one (both with the same list type, i.e. orderedor unordered), you need to use an EOB marker to separate the two:
Since paragraphs support line wrapping, it would usually not be possible to createcompact nested list, i.e. a list where the text is not wrapped in paragraphs because there is noblank line but a sub list after it:
However, this is an often used syntax and is therefore support by kramdown.
If you want to start a paragraph with something that looks like a list item marker, you need toescape it. This is done by escaping the period in an ordered list or the list item marker in anunordered list:
As mentioned at the beginning, an optional IAL for applying attributes to a list item can be usedafter the list item marker:
Definition Lists
This syntax feature is not part of the original Markdown syntax. The idea and syntax comes fromthe PHP Markdown Extra package.
Definition lists allow you to assign one or more definitions to one or more terms.
A definition list is started when a normal paragraph is followed by a line with a definition marker(a colon which may be optionally indented up to three spaces), then at least one tab or one space,optionally followed by an IAL that should be applied to the list item andthen the first part of the definition. The line with the definition marker may optionally beseparated from the preceding paragraph by a blank line. The leading tabs or spaces are stripped awayfrom this first line of the definition to allow for a nice alignment with the following definitioncontent. Each line of the preceding paragraph is taken to be a term and the lines separately parsedas span-level elements. Each such term may optionally start with an IALthat should be applied to the term.
The following is a simple definition list:
The column number of the first non-space character which appears after a definition marker on thesame line specifies the indentation that has to be used for the following lines of the definition.If there is no such character, the indentation that needs to be used is four spaces or one tab.Indented lines may be followed by lines containing text with any amount of indentation due to linewrapping. Note, however, that in addition to the rules outlined in the sectionabout line wrapping, a list item also ends when a line with another definition marker is encountered.
The indentation is stripped from the definition and it (note that the definition naturally alsocontains the content of the line with the definition marker) is processed as text containing blocklevel elements. If there is more than one definition, all other definition markers for the term maybe indented up to three spaces or the number of spaces used for the indentation of the lastdefinition minus one, whichever number is smaller. For example:
So, while the above is possible and creates a definition list with two terms and three definitionsfor them, it is not advised to use different (definition marker and definition) indents in the samedefinition list as well as lazy indentation!
The definition for a term is made up of text and/or block-level elements. If a definition is notpreceded by a blank line, the first part of the definition will just be text if it would be aparagraph otherwise:
The rules about having any block-level element as first element in a list item also apply to adefinition.
As mentioned at the beginning, an optional IAL for applying attributes to a term or a definition canbe used:
Tables
This syntax feature is not part of the original Markdown syntax. The syntax is based on the onefrom the PHP Markdown Extra package.
Sometimes one wants to include simple tabular data in a kramdown document for which using afull-blown HTML table is just too much. kramdown supports this with a simple syntax for ASCIItables.
Tables can be created with or without a leading pipe character: If the first line of a tablecontains a pipe character at the start of the line (optionally indented up to three spaces), thenall leading pipe characters (i.e. pipe characters that are only preceded by whitespace) are ignoredon all table lines. Otherwise they are not ignored and count when dividing a table line into tablecells.
There are four different line types that can be used in a table:
Table rows define the content of a table.
A table row is any line that contains at least one pipe character and is not identified as anyother type of table line! The table row is divided into individual table cells by pipe characters.An optional trailing pipe character is ignored. Note that literal pipe characters need to beescaped except if they occur in code spans or HTML
<code>
elements!Header rows, footer rows and normal rows are all done using these table rows. Table cells can onlycontain a single line of text, no multi-line text is supported. The text of a table cell is parsedas span-level elements.
Here are some example table rows:
Separator lines are used to split the table body into multiple body parts.
A separator line is any line that contains only pipes, dashes, pluses, colons and spaces/tabs andwhich contains at least one dash and one pipe character. The pipe and plus characters can be usedto visually separate columns although this is not needed. Multiple separator lines after anotherare treated as one separator line.
Here are some example separator lines:
The first separator line after at least one table row is treated specially, namely as headerseparator line. It is used to demarcate header rows from normal table rows and/or to set columnalignments. All table rows above the header separator line are considered to be header rows.
The header separator line can be specially formatted to contain column alignment definitions: Analignment definition consists of an optional space/tab followed by an optional colon, one or moredashes, an optional colon and another optional space/tab. The colons of an alignment definitionare used to set the alignment of a column: if there are no colons, the column uses the defaultalignment, if there is a colon only before the dashes, the column is left aligned, if there arecolons before and after the dashes, the column is center aligned and if there is only a colonafter the dashes, the column is right aligned. Each alignment definition sets the alignment forone column, the first alignment definition for the first column, the second alignment definitionfor the second column and so on.
Here are some example header separator lines with alignment definitions:
A footer separator line is used to demarcate footer rows from normal table rows. All table rowsbelow the footer separator line are considered to be footer rows.
A footer separator line is like a normal separator line except that dashes are replaced by equalsigns. A footer separator line may only appear once in a table. If multiple footer separator linesare used in one table, only the last is treated as footer separator line, all others are treatedas normal separator lines. Normal separator lines that are used after the footer separator lineare ignored.
Here are some example footer separator lines:
Trailing spaces or tabs are ignored in all cases. To simplify table creation and maintenance,header, footer and normal separator lines need not specify the same number of columns as table rows;even |-
and |=
are a valid separators.
Given the above components, a table is specified by
- an optional separator line,
- optionally followed by zero, one or more table rows followed by a header separator line,
- one or more table rows, optionally interspersed with separator lines,
- optionally followed by a footer separator line and zero, one or more table rows and
- an optional trailing separator line.
Also note
- that the first line of a table must not have more than three spaces of indentation before thefirst non-space character,
- that each line of a table needs to have at least one not escaped pipe character so that kramdownrecognizes it as a line belonging to the table and
- that tables have to start and end on block boundaries!
The table syntax differs from the one used in PHP Markdown Extra as follows:
- kramdown tables do not need to have a table header.
- kramdown tables can be structured using separator lines.
- kramdown tables can contain a table footer.
- kramdown tables need to be separated from other block-level elements.
Here is an example for a kramdown table with a table header row, two table bodies and a table footerrow:
The above example table is rather time-consuming to create without the help of an ASCII tableeditor. However, the table syntax is flexible and the above table could also be written like this:
Horizontal Rules
A horizontal rule for visually separating content is created by using three or more asterisks,dashes or underscores (these may not be mixed on a line), optionally separated by spaces or tabs, onan otherwise blank line. The first asterisk, dash or underscore may optionally be indented up tothree spaces. The following examples show different possibilities to create a horizontal rule:
Math Blocks
This syntax feature is not part of the original Markdown syntax. The idea comes from the Marukuand Pandoc packages.
kramdown has built-in support for block and span-level mathematics written in LaTeX.
A math block needs to start and end on block boundaries. It is started usingtwo dollar signs, optionally indented up to three spaces. The math block continues until the nexttwo dollar signs (which may be on the same line or on one of the next lines) that appear at the endof a line, i.e. they may only be followed by whitespace characters. The content of a math block hasto be valid LaTeX math. It is always wrapped inside a begin{displaymath}...end{displaymath}
environment except if it begins with a begin
statement.
The following kramdown fragment
renders (using Javascript library MathJax) as
[begin{aligned} & phi(x,y) = phi left(sum_{i=1}^n x_ie_i, sum_{j=1}^n y_je_j right) = sum_{i=1}^n sum_{j=1}^n x_i y_j phi(e_i, e_j) = & (x_1, ldots, x_n) left( begin{array}{ccc} phi(e_1, e_1) & cdots & phi(e_1, e_n) vdots & ddots & vdots phi(e_n, e_1) & cdots & phi(e_n, e_n) end{array} right) left( begin{array}{c} y_1 vdots y_n end{array} right)end{aligned}]Using inline math is also easy: just surround your math content with two dollar signs, like with amath block. If you don’t want to start an inline math statement, just escape the dollar signs andthey will be treated as simple dollar signs.
Note that LaTeX code that uses the pipe symbol |
in inline math statements may lead to aline being recognized as a table line. This problem can be avoided by using the vert
commandinstead of |
!
If you have a paragraph that looks like a math block but should actually be a paragraph with just aninline math statement, you need to escape the first dollar sign:
If you don’t even want the inline math statement, escape the first two dollar signs:
HTML Blocks
The original Markdown syntax specifies that an HTML block must start at the left margin, i.e. noindentation is allowed. Also, the HTML block has to be surrounded by blank lines. Bothrestrictions are lifted for kramdown documents. Additionally, the original syntax does not allowyou to use Markdown syntax in HTML blocks which is allowed with kramdown.
An HTML block is potentially started if a line is encountered that begins with a non-span-level HTMLtag or a general XML tag (opening or closing) which may be indented up to three spaces.
The following HTML tags count as span-level HTML tags and won’t start an HTML block if found atthe beginning of an HTML block line:
Further parsing of a found start tag depends on the tag and in which of three possible ways itscontent is parsed:
Parse as raw HTML block: If the HTML/XML tag content should be handled as raw HTML, then onlyHTML/XML tags are parsed from this point onwards and text is handled as raw, unparsed text untilthe matching end tag is found or until the end of the document. Each found tag will be parsed asraw HTML again. However, if a tag has a
markdown
attribute, this attribute controls parsing ofthis one tag (see below).Note that the parser basically supports only correct XHTML! However, there are some exceptions.For example, attributes without values (i.e. boolean attributes) as well as unquoted attributevalues are also supported and elements without content like
<hr />
can be written as<hr>
. Ifan invalid closing tag is found, it is ignored.Parse as block-level elements: If the HTML/XML tag content should be parsed as text containingblock-level elements, the remaining text on the line will be parsed by the block-level parser asif it appears on a separate line (Caution: This also means that if the line consists of thestart tag, text and the end tag, the end tag will not be found!). All following lines are parsedas block-level elements until an HTML block line with the matching end tag is found or until theend of the document.
Parse as span-level elements: If the HTML/XML tag content should be parsed as text containing spanlevel elements, then all text until the next matching end tag or until the end of the documentwill be the content of the tag and will later be parsed by the span-level parser. This also meansthat if the matching end tag is inside what appears to be a code span, it is still used!
If there is text after an end tag, it will be parsed as if it appears on a separate line except wheninside a raw HTML block.
Also, if an invalid closing tag is found, it is ignored.
Note that all HTML tag and attribute names are converted to lowercase!
By default, kramdown parses all block HTML tags and all XML tags as raw HTML blocks. However, thiscan be configured with the parse_block_html
. If this is set to true
, then syntax parsing in HTMLblocks is globally enabled. It is also possible to enable/disable syntax parsing on a tag per tagbasis using the markdown
attribute:
If an HTML tag has an attribute
markdown='0'
, then the tag is parsed as raw HTML block.If an HTML tag has an attribute
markdown='1'
, then the default mechanism for parsing syntax inthis tag is used.If an HTML tag has an attribute
markdown='block'
, then the content of the tag is parsed as blocklevel elements.If an HTML tag has an attribute
markdown='span'
, then the content of the tag is parsed as spanlevel elements.
The following list shows which HTML tags are parsed in which mode by default when markdown='1'
isapplied or parse_block_html
is true
:
Also, all general XML tags are parsed as raw HTML blocks.
Remember that all span-level HTML tags like a
or b
do not start an HTML block! However, theabove lists also include span-level HTML tags in the case the markdown
attribute is used on atag inside a raw HTML block.
Here is a simple example input and its HTML output with parse_block_html
set to false
:
As one can see the content of the div
tag will be parsed as raw HTML block and left alone.However, if the markdown='1'
attribute was used on the div
tag, the content would be parsed asblock-level elements and therefore converted to a paragraph.
You can also use several HTML tags at once:
However, remember that if the content of a tag is parsed as block-level elements, the content thatappears after a start/end tag but on the same line, is processed as if it appears on a new line:
Since setting parse_block_html
to true
can lead to some not wanted behaviour, it is generallybetter to selectively enable or disable block/span-level elements parsing by using the markdown
attribute!
Unclosed block-level HTML tags are correctly closed at the end of the document to ensure correctnesting and invalidly used end tags are removed from the output:
The parsing of XML comments is also supported. The content of XML comments may span multiple lines.The start of an XML comment may only appear at the beginning of a line, optionally indented up tothree spaces. If there is text after the end of an XML comment, it will be parsed as if it appearson a separate line. kramdown syntax in XML comments is not processed:
These elements are all span-level elements and used inside block-level elements to markup textfragments. For example, one can easily create links or apply emphasis to certain text parts.
Note that empty span-level elements are not converted to empty HTML tags but are copied as-is to theoutput.
Links and Images
Three types of links are supported: automatic links, inline links and reference links.
Automatic Links
This is the easiest one to create: Just surround a web address or an email address with anglebrackets and the address will be turned into a proper link. The address will be used as link targetand as link text. For example:
It is not possible to specify a different link text using automatic links – use the other linktypes for this!
Inline Links
As the wording suggests, inline links provide all information inline in the text flow. Referencestyle links only provide the link text in the text flow and everything else is definedelsewhere. This also allows you to reuse link definitions.
An inline style link can be created by surrounding the link text with square brackets, followedimmediately by the link URL (and an optional title in single or double quotes preceded by at leastone space) in normal parentheses. For example:
Notes:
The link text is treated like normal span-level text and therefore is parsed and converted.However, if you use square brackets within the link text, you have to either properly nest them orto escape them. It is not possible to create nested links!
The link text may also be omitted, e.g. for creating link anchors.
The link URL has to contain properly nested parentheses if no title is specified, or the link URLmust be contained in angle brackets (incorrectly nested parentheses are allowed).
The link title may not contain its delimiters and may not be empty.
Additional link attributes can be added by using a span IAL after the inline link,for example:
Reference Links
To create a reference style link, you need to surround the link text with square brackets (as withinline links), followed by optional spaces/tabs/line breaks and then optionally followed withanother set of square brackets with the link identifier in them. A link identifier may not contain aclosing bracket and, when specified in a link definition, newline characters; it is also not casesensitive, line breaks and tabs are converted to spaces and multiple spaces are compressed into one.For example:
If you don’t specify a link identifier (i.e. only use empty square brackets) or completely omit thesecond pair of square brackets, the link text is converted to a valid link identifier by removingall invalid characters and inserting spaces for line breaks. If there is a link definition found forthe link identifier, a link will be created. Otherwise the text is not converted to a link.
As with inline links, additional link attributes can be added by using a span IALafter the reference link.
Link Definitions
The link definition can be put anywhere in the document. It does not appear in the output. A linkdefinition looks like this:
Link definitions are, despite being described here, non-content block-level elements.
The link definition has the following structure:
- The link identifier in square brackets, optionally indented up to three spaces,
- then a colon and one or more optional spaces/tabs,
- then the link URL which must contain at least one non-space character, or a left angle bracket,the link URL and a right angle bracket,
- then optionally the title in single or double quotes, separated from the link URL by one or morespaces or on the next line by itself indented any number of spaces/tabs.
The original Markdown syntax also allowed the title to be specified in parenthesis. This is notallowed for consistency with the inline title.
If you have some text that looks like a link definition but should really be a link and some text,you can escape the colon after the link identifier:
Although link definitions are non-content block-level elements, block IALs can beused on them to specify additional attributes for the links:
Images
Images can be specified via a syntax that is similar to the one used by links. The difference isthat you have to use an exclamation mark before the first square bracket and that the link text of anormal link becomes the alternative text of the image link. As with normal links, image links can bewritten inline or reference style. For example:
The link definition for images is exactly the same as the link definition for normal links. Sinceadditional attributes can be added via span and block IALs, it is possible, for example, to specifyimage width and height:
Emphasis
kramdown supports two types of emphasis: light and strong emphasis. Text parts that are surroundedwith single asterisks *
or underscores _
are treated as text with light emphasis, text partssurrounded with two asterisks or underscores are treated as text with strong emphasis. Surroundedmeans that the starting delimiter must not be followed by a space and that the stopping delimitermust not be preceded by a space.
Kramdown Jekyll Math
Here is an example for text with light and strong emphasis:
The asterisk form is also allowed within a single word:
Text can be marked up with both light and strong emphasis, possibly using different delimiters.However, it is not possible to nest strong within strong or light within light emphasized text:
If one or two asterisks or underscores are surrounded by spaces, they are treated literally. If youwant to force the literal meaning of an asterisk or an underscore you can backslash-escape it:
Code Spans
This is the span-level equivalent of the code block element. You can markup a textpart as code span by surrounding it with backticks `
. For example:
Note that all special characters in a code span are treated correctly. For example, when a code spanis converted to HTML, the characters <
, >
and &
are substituted by their respective HTMLcounterparts.
To include a literal backtick in a code span, you need to use two or more backticks as delimiters.You can insert one optional space after the starting and before the ending delimiter (these spacesare not used in the output). For example:
A single backtick surrounded by spaces is treated as literal backtick. If you want to force theliteral meaning of a backtick you can backslash-escape it:
As with code blocks you can set the language of a code span by using anIAL:
HTML Spans
HTML tags cannot only be used on the block-level but also on the span-level. Span-level HTML tagscan only be used inside one block-level element, it is not possible to use a start tag in one blocklevel element and the end tag in another. Note that only correct XHTML is supported! This means thatyou have to use, for example, <br />
instead of <br>
(although kramdown tries to fix such errorsif possible).
By default, kramdown parses kramdown syntax inside span HTML tags. However, this behaviour can beconfigured with the parse_span_html
option. If this is set to true
, then syntax parsing in HTMLspans is enabled, if it is set to false
, parsing is disabled. It is also possible toenable/disable syntax parsing on a tag per tag basis using the markdown
attribute:
If an HTML tag has an attribute
markdown='0'
, then no parsing (except parsing of HTML span tags)is done inside that HTML tag.If an HTML tag has an attribute
markdown='1'
, then the content of the tag is parsed as spanlevel elements.If an HTML tag has an attribute
markdown='block'
, then a warning is issued because HTML spanscannot contain block-level elements and the attribute is ignored.If an HTML tag has an attribute
markdown='span'
, then the content of the tag is parsed as spanlevel elements.
The content of a span-level HTML tag is normally parsed as span-level elements. Note, however, thatsome tags like <script>
are not parsed, i.e. their content is not modified.
XML comments can also be used (their content is not parsed). However, as with HTML tags the startand the end have to appear in the same block-level element.
Span-level XML comments as well as general span-level HTML and XML tags have to be preceded by atleast one non whitespace character on the same line so that kramdown correctly recognizes them asspan-level element and not as block-level element. However, all span HTML tags, i.e. a
, em
, b
,…, (opening or closing) can appear at the start of a line.
Unclosed span-level HTML tags are correctly closed at the end of the span-level text to ensurecorrect nesting and invalidly used end tags or block HTML tags are removed from the output:
Also note that one or more consecutive new line characters in an HTML span tag are replaced by asingle space, for example:
Footnotes
This syntax feature is not part of the original Markdown syntax. The idea and syntax comes fromthe PHP Markdown Extra package.
Footnotes in kramdown are similar to reference style links and link definitions. You need to placethe footnote marker in the correct position in the text and the actual footnote content can bedefined anywhere in the document.
More exactly, a footnote marker can be created by placing the footnote name in square brackets.The footnote name has to start with a caret (^
), followed by a word character or a digit and thenoptionally followed by other word characters, digits or dashes. For example:
Note that footnote markers cannot be used as part of the link text of a link because this wouldlead to nested links which is not allowed in HTML.
Footnote markers with the same name will link to the same footnote definition. The actual naming ofa footnote does not matter since the numbering of footnotes is controlled via the position of thefootnote markers in the document (the first found footnote marker will get the number 1, the second new footnote marker the number 2 and so on). If there is a footnote definition found for theidentifier, a footnote will be created. Otherwise the footnote marker is not converted to a footnotelink. Also note that all attributes set via a span IAL are ignored for a footnote marker!
A footnote definition is used to define the content of a footnote and has the following structure:
- The footnote name in square brackets, optionally indented up to three spaces,
- then a colon and one or more optional spaces,
- then the text of the footnote
- and optionally more text on the following lines which have to follow the syntax for standard codeblocks (the leading four spaces/one tab are naturally stripped from thetext)
Footnote definitions are, despite being described here, non-content block-level elements.
The whole footnote content is treated like block-level text and can therefore contain any validblock-level element (also, any block-level element can be the first element). If you want to have acode block as first element, note that all leading spaces/tabs on the first line are stripped away.Here are some example footnote definitions:
It does not matter where you put a footnote definition in a kramdown document; the content of allreferenced footnote definitions will be placed at the end of the kramdown document. Not referencedfootnote definitions are ignored. If more than one footnote definitions have the same footnote name,all footnote definitions but the last are ignored.
Although footnote definitions are non-content block-level elements, block IALs can beused on them to attach attributes. How these attributes are used depends on the converter.
Abbreviations
This syntax feature is not part of the original Markdown syntax. The idea and syntax comes fromthe PHP Markdown Extra package.
kramdown provides a syntax to assign the full phrase to an abbreviation. When writing the text, youdon’t need to do anything special. However, once you add abbreviation definitions, theabbreviations in the text get marked up automatically. Abbreviations can consist of any characterexcept a closing bracket.
An abbreviation definition is used to define the full phrase for an abbreviation and has thefollowing structure:
- An asterisk and the abbreviation in square brackets, optionally indented up to threespaces,
- then a colon and the full phrase of the abbreviation on one line (leading and trailing spaces arestripped from the full phrase).
Later abbreviation definitions for the same abbreviation override prior ones and it does not matterwhere you put an abbreviation definition in a kramdown document. Empty definitions are also allowed.
Although abbreviation definitions are non-content block-level elements, block IALscan be used on them to specify additional attributes.
Here are some examples:
Abbreviation definitions are, despite being described here, non-content block-level elements.
Typographic Symbols
The original Markdown syntax does not support these transformations.
kramdown converts the following plain ASCII character into their corresponding typographic symbols:
---
will become an em-dash (like this —)--
will become an en-dash (like this –)...
will become an ellipsis (like this …)<<
will become a left guillemet (like this «) – an optional following space will become anon-breakable space>>
will become a right guillemet (like this ») – an optional leading space will become anon-breakable space
The parser also replaces normal single '
and double quotes '
with “fancy quotes”. There may betimes when kramdown falsely replace the quotes. If this is the case, just 'escape'
the quotes andthey won’t be replaced with fancy ones.
This section describes the non-content elements that are used in kramdown documents, i.e. elementsthat don’t provide content for the document but have other uses such as separating block-levelelements or attaching attributes to elements.
Three non-content block-level elements are not described here because they fit better where theyare:
End-Of-Block Marker
The EOB marker is not part of the standard Markdown syntax.
The End-Of-Block (EOB) marker – a ^
as first character on an otherwise empty line – is a blocklevel element that can be used to specify the end of a block-level element even if the block-levelelement, after which it is used, would continue otherwise. If there is no block-level element toend, the EOB marker is simply ignored.
You won’t find an EOB marker in most kramdown documents but sometimes it is necessary to use it toachieve the wanted results which would be impossible otherwise. However, it should only be used whenabsolutely necessary!
For example, the following gives you one list with two items:
By using an EOB marker, you can make two lists with one item each:
Attribute List Definitions
This syntax feature is not part of the original Markdown syntax. The idea and syntax comes fromthe Maruku package.
This is an implementation of Maruku’s feature for adding attributes to block and span-levelelements (the naming is also taken from Maruku). This block-level element is used to defineattributes which can be referenced later. The Block Inline Attribute List is used toattach attributes to a block-level element and the Span Inline Attribute List is usedto attach attributes to a span-level element.
Following are some examples of attribute list definitions (ALDs) and afterwards comes the syntaxexplanation:
An ALD line has the following structure:
Jekyll Kramdown Extension
- a left brace, optionally preceded by up to three spaces,
- followed by a colon, the reference name and another colon,
- followed by attribute definitions (allowed characters are backslash-escaped closing braces or anycharacter except a not escaped closing brace),
- followed by a closing brace and optional spaces until the end of the line.
The reference name needs to start with a word character or a digit, optionally followed by otherword characters, digits or dashes.
There are four different types of attribute definitions which have to be separated by one or morespaces:
This must be a valid reference name. It is used to reference an other ALD so that the attributesof the other ALD are also included in this one. The reference name is ignored when collecting theattributes if no attribute definition list with this reference name exists. For example, a simplereference looks like id
.
A key-value pair is defined by a key name, which must follow the rules for reference names, thenan equal sign and then the value in single or double quotes. If you need to use the valuedelimiter (a single or a double quote) inside the value, you need to escape it with a backslash.Key-value pairs can be used to specify arbitrary attributes for block or span-level elements. Forexample, a key-value pair looks like key1='bef 'quoted' aft'
or title='This is a title'
.
An ID name is defined by using a hash and then the identifier name which needs to start with anASCII alphabetic character (A-Z or a-z), optionally followed by other ASCII characters, digits,dashes or colons. This is a short hand for the key-value pair id='IDNAME'
since this is oftenused. The ID name specifies the unique ID of a block or span-level element. For example, an IDname looks like #myid
.
A class name is defined by using a dot and then the class name which may contain any characterexcept whitespace, the dot character and the hash character.
This is (almost, but not quite) a short hand for the key-value pair class='class-name'
. Almostbecause it actually means that the class name should be appended to the current value of theclass
attribute. The following ALDs are all equivalent:
As can be seen from the example of the class names, attributes that are defined earlier areoverwritten by ones with the same name defined later.
Also, everything in the attribute definitions part that does not match one of the above four typesis ignored.
If there is more than one ALD with the same reference name, the attribute definitions of all theALDs are processed like they are defined in one ALD.
Inline Attribute Lists
These elements are used to attach attributes to another element.
Block Inline Attribute Lists
This syntax feature is not part of the original Markdown syntax. The idea and syntax comes fromthe Maruku package.
This block-level element is used to attach attributes to another block-level element. A block inlineattribute list (block IAL) has the same structure as an ALD exceptthat the colon/reference name/colon part is replaced by a colon. A block IAL (or two or more blockIALs) has to be put directly before or after the block-level element to which the attributes shouldbe attached. If a block IAL is directly after and before a block-level element, it is applied topreceding element. The block IAL is ignored in all other cases, for example, when the block IAL issurrounded by blank lines.
Key-value pairs of an IAL take precedence over equally named key-value pairs in referenced ALDs.
Here are some examples for block IALs:
Span Inline Attribute Lists
This syntax feature is not part of the original Markdown syntax. The idea and syntax comes fromthe Maruku package.
This is a version of the block inline attribute list for span-level elements. It hasthe same structure as the block IAL except that leading and trailing spaces are not allowed. A spanIAL (or two or more span IALs) has to be put directly after the span-level element to which itshould be applied, no additional character is allowed between, otherwise it is ignored and onlyremoved from the output.
Here are some examples for span IALs:
The special span IAL {::}
contains no attributes but doesn’t generate a warning either. It can beused to separate consecutive elements that would be falsely parsed if not separated. Here is an usecase:
Extensions
This syntax feature is not part of the original Markdown syntax.
Extensions provide additional functionality but use the same syntax for it. They are available asblock as well as span-level elements.
The syntax for an extension is very similar to the syntax of ALDs.Here are some examples of how to specify extensions and afterwards is the syntax definition:
Jekyll Kramdown Options
An extension can be specified with or without a body. Therefore there exist a start and an end tagfor extensions. The start tag has the following structure:
- a left brace,
- followed by two colons and the extension name,
- optionally followed by a space and attribute definitions (allowed characters are backslash-escapedclosing braces or any character except a not escaped closing brace – same as with ALDs),
- followed by a slash and a right brace (in case the extension has no body) or only a rightbrace (in case the extension has a body).
The stop tag has the following structure:
- a left brace,
- followed by a colon and a slash,
- optionally followed by the extension name,
- followed by a right brace.
A stop tag is only needed if the extension has a body!
The above syntax can be used as is for span-level extensions. The starting and ending lines for block-levelextensions are defined as:
- The starting line consists of the extension start tag, optionally preceded by up to three spaces,and followed by optional spaces until the end of the line.
- The ending line consists of the extension stop tag, optionally preceded by up to three spaces,and followed by optional spaces until the end of the line.
If no end tag can be found for an extension start tag, the start tag is treated as if it has nobody. If an invalid extension stop tag is found, it is ignored. If an invalid extension name isspecified the extension (and the eventually specified body) are ignored.
The following extensions can be used with kramdown:
comment
Treat the body text as a comment which does not show in the output.
nomarkdown
Don’t process the body with kramdown but output it as-is. The attribute type
specifies whichconverters should output the body: if the attribute is missing, all converters should output it.Otherwise the attribute value has to be a space separated list of converter names and theseconverters should output the body.
options
Should be used without a body since the body is ignored. Is used for setting the global optionsfor the kramdown processor (for example, to disable automatic header ID generation). Note thatoptions that are used by the parser are immediately effective whereas all other options are not!This means, for example, that it is not possible to set converter options only for some part of akramdown document.
I was keeping an eye on the GitHub Enterprise release notes to see when a patch for my previous bug would land, and when it did there was also a critical fix for an issue in Kramdown:
The description of CVE-2020-14001 gave a pretty good summary of what the issue was and how it could be exploited:
The kramdown gem before 2.3.0 for Ruby processes the template option inside Kramdown documents by default, which allows unintended read access (such as template=”/etc/passwd”) or unintended embedded Ruby code execution (such as a string that begins with template=”string://<%= `). NOTE: kramdown is used in Jekyll, GitLab Pages, GitHub Pages, and Thredded Forum.
The template option for kramdown can accept any file path or if it starts with string://
then it will be used as the template contents. Since the templates are ERBs, this allows for arbitrary ruby code to be executed.
To test out this issue, I created a new Jekyll site and added the following to the _config.yaml
:
After starting up and loading the page the custom ERB had indeed been used:
Discovery
That got me thinking about what other options Jekyll and Kramdown allowed and if any of them could be exploited. GitHub Pages was using a version of Kramdown based on version 1.17.0, so I was looking through the the Kramdown::Options
module for that version and saw that the simple_hash_validator was using YAML.load
which has the potential to create arbitrary ruby objects via deserialisation:
This could be hit with the syntax_highlighter_opts option, but after trying a few payloads I realised that the pages_jekyll
gem loads safe_yaml which prevents YAML.load
from deserialising ruby object.
A few hours later I came across an interesting option that didn’t seem to be documented like the others. It was used when creating a new Kramdown::Document and there was a handy comment:
So if the :input
option exists, the first letter is made uppercase, then it is passed to try_require
with the type set to parser
:
As implementation of snake_case only cared about alpha characters and ignore everything else, this mean that directory traversal was possible causing require
to load a file outside of the intended path!
I created a file /tmp/evil.rb
with the contents system('echo hi > /tmp/ggg')
and started jekyll with the following _config.yml
:
Jekyll failed to build and output jekyll 3.8.5 | Error: wrong constant name ../../../../../../../../../../../../../../../tmp/evil.rb
, but looking at in /tmp/
the file existed meaning the ruby code had been run!
Exploit
I created a new pages repo on my GHE server, added the /tmp/evil.rb
payload and confirmed that the same thing happened. Next thing was to work out how to get controllable ruby file to a known location so that it could be used as the payload. I used opensnoop
from perf-tools and watched the paths as github built the jekyll site and saw that the following directories were being used:
The first was the input directory and the second the output, but both were quickly removed after the process had finished and copied to a hashed location. Since the output directory was only based on the user and repo name that would be the easiest, just had to work out how to make it hang around for longer than normal.
I created five 100mb files using dd if=/dev/zero of=file.out bs=1000000 count=100
as well as a code.rb
payload and added them to a jekyll site, then created a loop that just pushed the repo over and over again with while true; do git add -A . && git commit --amend -m aa && git push -f; done
. Looking at the /data/user/tmp/pages/pagebuilds/vakzz/jekyll1
directory it was now present for a much longer time.
Final step was to create a new site that had a malicious input
that pointed to the first jeykll build folder:
Then set that repo pushing and building in a loop as well. After around a minute the file appeared!
I wrote up the report and sent it through and once again it was triaged amazingly fast (within 30 minutes). A few hours later I received a response saying they were working on hardening the Kramdown options and if I knew of any others that should be restricted.
The only other option that looked a bit suspicious had been the formatter_class (set as part of syntax_highlighter_opts), but it had validation allowing only alpha numeric and was then looked up using :Rouge::Formatters.const_get
At the time I thought this was fairly safe, but mentioned it along with the simple_hash_validator
.
The next night I was looking into how ::Rouge::Formatters.const_get
actually worked. It turned out that it didn’t restrict the constant to ::Rouge::Formatters
like I’d originally thought and could return any constant/class that had been defined. The regex was still limiting (no ::
allowed) but it still could be used to return quite a few classes. Once the constant was found it was used to create a new instance and then have the format
method called:
To test this out, I edited the _config.yml
with the following and then tried to build the site.
It blew up, but the error message showed that the CVS class had been created!
I added a comment to the report saying that the formatter options should definitely be restricted and that I would continue to look see if it was exploitable.
So what we had now was the ability to create a top level ruby object whose initialiser took a single hash, and we had a fair amount of control over what was in that hash. I spend a bit of time google and testing things in ruby for how to get a list of constants, before coming up with the following script:
It was pretty quick and dirty, but basically found all of the constants that matched the regex and tried to create a new instance using a hash. I logged into the GHE server, went to the pages directory and ran the script. There were quite a few that reported worked
or maybe
, but a lot could be discard as they were things like StandardError
.
Jekyll Kramdown-parser-gfm
I stared working through the list of classes looking at the code to see what happened in the initialiser, not finding much of interest until coming across this:
Already the error message sounded promising! The Hoosegow initialize method was the following:
And the load_inmate_methods
method was:
This was perfect! Since we could add anything to the options
hash, this would allow us to pass in our own inmate_dir
directory and then all we need to do is have a malicious inmate.rb
there waiting.
Following the same process as before, I edited the _config.yml
with the following:
Jekyll Kramdown Coderay
Then created the /tmp/inmate.rb
file on the GHE server with a payload and pushed the jekyll site. A few seconds later the file had been required and the payload executed!
Jekyll Generator
Timeline
August 20, 2020 00:18:42 AEST - Reported RCE to GitHub via HackerOne
August 20, 2020 00:50:41 AEST - Report triaged
August 20, 2020 06:12:37 AEST - Confirmed working on fix, asked about other options
August 20, 2020 07:14:57 AEST - Sent through other potential options
August 20, 2020 22:55:52 AEST - Reported formatter_class discovery
August 20, 2020 23:49:55 AEST - Reported RCE via Hoosegow class
August 27, 2020 04:21:37 AEST - CVE-2020-10518 issued and GHE release pending
October 15, 2020 05:48:59 AEDT - $20,000 bounty + $5,000 bonus