DocBook, AsciiDoc or Sphinx – Choices, Choices…! A Comparison of Document Formats
The following article has been contributed by Janina Setz, Trainee at SUSE. The original article has been published in German at DataCenter Insider. A big “Thank You” to chief editor Ulrike Ostler for the permission to publish the article in English at the SUSE blog.
Anyone who has ever used an instruction manual knows how important it is to be able to refer to accurate documentation and notes on various products. Documentation is crucial – especially in the software world – and there are plenty of tools that can be used to create documentation texts, depending on the authors’ preference.
This article compares the most commonly used document formats and markup languages for creating articles and technical documentation: DocBook, AsciiDoc and Sphinx. The table below gives an initial overview of the features, in the order they are dealt with in this article.
Overview of format options:
How Does it Work?
First, a brief introduction to how each document format works:
- AsciiDoc is a simplified markup language based on plain text. Files can be converted to HTML and DocBook and from there to other output formats. AsciiDoc can be converted using pandoc or AsciiDoc commands. The conversion program used for this (asciidoc) was written in Python.
- DocBook is an XML-based document format. First, the format is checked in terms of its structure (“validation”), then sent via DocBook XSL stylesheets to a destination format (HTML, text, Manpage, etc.). If you want a PDF file, it is first changed into an intermediate format that the FOP tool converts to PDF. For example, the SUSE documentation team uses the self-developed program, “DocBook Authoring and Publishing Suite” (DAPS), for converting texts into different file formats for DocBook. DAPS was released under GNU Public License, which makes it free to use for other companies.
- Sphinx is a software that uses the simplified markup language reStructuredText (reST) to ultimately generate formats such as PDF. You use Sphinx-specific commands to convert texts into different file formats. The scalability, syntactic analysis functions and the translation system of DocUtils, including reST, are key elements with Sphinx.
To understand the semantics of our three formats, we can use simple maxims that are a great way to describe how to interpret the written text:
The option to edit text files online is a key element with GitHub. All document formats and languages are supported here, but in different views. For example, you can only view DocBook in GitHub as a colored XML source code, which makes it hard to read for some developers. AsciiDoc and Sphinx, however, are already easily readable formats and even appear in GitHub as rendered HTML.
Outside of GitHub, it is common to use a text editor. You can use any text editor in principle, but the following editors are especially recommended for specific document formats because of their supporting features.
These editors support each of the formats by providing options such as highlighting, extension functions, setting macros and previewing.
DocBook comes into its own here, because it can use the XML tree to link to a “schema”. If an editor can use this schema, the editor shows some suggestions of contextually valid tags for you to select.
Extensions are useful for using languages in a varied and individual way. Extensions can extend the format itself or come into play in the conversion.
AsciiDoc itself includes four official extensions, and there are a number of third-party extensions also available.
Sphinx also has some of its own extension options, including “imgmath” and “doctest”. But, again, third-party extensions are available for Sphinx.
DocBook is the only one to include another form of document format customization. You can clearly define which tags can be used in which combinations, how many and how in-depth the subtags of each tag can be, and which tags can be used at all for a certain document.
References are a great help for most technical writers and documentation developers, because they can set standards and simplify the writing process. This option is provided with all the document formats featured in our comparison. DocBook uses entity definitions (usually stored in its own .ent files), AsciiDoc uses internal references and Sphinx works with substitutions and “ref_epilog” files.
In some cases, documentation is required for variants that only differ slightly. For example, you can have different products with different names and which are installed in different ways, but they are all used in the same way.
In this case, all possible variants are integrated into the document and each delineated by markers. You do this using attributes in DocBook, “ifdef” and “enddef” attribute functions in AsciiDoc, and SET profiling in Sphinx.
A key element of more complex documents that include a large number of different articles and books, is modularization. All the document formats in our comparison offer this option by way of “includes”.
So that you can check if the documentation is permissible in your chosen format and contains no formal errors, i.e. it is valid, you need the process to include regular and final validation checks.
DocBook (XML) texts are easily validated in compatible (XML) editors or with a DAPS function (daps –validate), which matches the structure of the document against a schema (a kind of “blueprint”).
AsciiDoc and Sphinx do not offer the kind of rigorous validation available with DocBook, instead they just check the syntax. They do not provide the structural validation offered with DocBook.
The AsciiDoc validation option is only available with a handful of editors and the W3C command line validator. You can validate Sphinx documentation via an internal extension.
The document formats support some same, some different file or output formats.
Usually, markup (text) and layout (design) are nowadays separated so that they can be implemented independently of each other. This means that editors can focus entirely on their text without having to worry about the design.
DocBook provides stylesheets for the design, but you can create your own stylesheets or customize existing ones. You can also edit with AsciiDoc as it uses DocBook stylesheets. In both cases, however, XSLT skills are necessary. Sphinx comes with “themes” but, again, you can create and add your own templates.
Strengths and Weaknesses
We will finish by setting out some of the weaknesses and strengths of the different document formats.
DocBook’s weakness lies mainly in the tags, unless individual settings and limitations apply. The number of tags, their combinations and the requirements of the various tags can be confusing. As such, getting to know this document format takes longer than with AsciiDoc or Sphinx. Also, XML is not particularly readable, so that XML laymen may have difficulties and feedback is complicated. You should also only use DocBook if you use an XML-enabled editor.
The main problem with AsciiDoc is that you cannot directly output to PDF – you need to use DocBook as an intermediate step. A compatible editor is also important for this markup language because the text identifiers absolutely need a highlighting feature to give you a clear overview of the code.
The same is true for Sphinx. The main, major weakness of Sphinx is that it is difficult to locate errors. What’s more, unlike DocBook and AsciiDoc, Sphinx is less easy to extend and is less flexible in its functions.
Of course, each of these formats has its strengths. For example, DocBook offers an easy import function for other formats, a high level of portability and a high and flexible level of scalability. You can also structurally validate texts using a schema.
The strengths of AsciiDoc lie mainly in its good code readability and simple setup. The readability of Sphinx (reST) is also a plus, as is its excellent HTML output with integrated quick search bar.
If you need an easily customizable document format with flexible extensibility and compatibility for comprehensive and complex documentation, DocBook is your best choice for sure. But if you need to work closely with developers who need the format to be directly readable, AsciiDoc and Sphinx are better options.