Blog Post

Executable API Documentation with the LLVM Integration Tester and FileCheck

One of the most common problems in software development is how difficult it is to keep documentation and implementation in sync. Wouldn’t it be great if we had a system where parts of our documentation could be continuously executed to check that they stay correct and up to date? Well, this article will describe a system that can do that using two tools from the LLVM project: the LLVM Integration Tester (lit) and FileCheck.

Keeping Documentation and Implementation in Sync Is Hard

Software developers usually create APIs that are consumed by other software developers, and they provide them with documentation that’s typically written in either Markdown or HTML. Even though the documentation is often generated from the code comments automatically, small differences between code and documentation can cause confusion and make the adoption of new APIs more difficult. One way to reduce the impact of this problem is by using tests as documentation.

Using Tests as Executable Documentation

All software requires tests to ensure it works correctly now and in the future. There are many kinds of tests, including unit tests, integration tests, and behavior-driven tests. The main goal of the latter is to produce tests written in a domain language that people across a software project can understand. Even though tests are useful for understanding how a particular piece of software works, they’re still written in a programming language or a domain-specific language. This may be a barrier of entry in two situations:

Developers who aren’t familiar with the language or test framework used to write the tests may have difficulties understanding how something works or how to write new tests.
Non-developers may want to report an issue and accompany it with a failing test case, but they need a developer to write the failing test case for them.

At PSPDFKit, we recently explored new ways to write executable documentation in order to reduce the needs of maintaining separate code and documentation as much as possible. Our approach to this problem is pragmatic in the sense that we don’t want to convert 100 percent of our codebase to an “executable documentation format”; we don’t think that’s achievable given the current state of the art and our special requirements.

Instead, we’ve started identifying parts of our products where we could replace existing tests with executable documentation. One of those parts is an internal component that implements a JSON-based API. The next section describes how we introduced executable documentation for that part using two tools: lit and FileCheck.

Executable Documentation Using lit and FileCheck

One of our internal components at PSPDFKit is a client-server architecture that employs a JSON-like format for communication: The client sends JSON requests for the commands it wants the server to run, and the server runs them and generates JSON responses in return.

When we were exploring techniques to test this component, we came up with the idea of using something based on lit and FileCheck. The LLVM project uses lit to drive the regression test suite in the LLVM project, and its longstanding history of big tech companies like Apple and Google contributing to it was particularly appealing to us. Regression tests for a compiler are similar to the kind of tests we’d need to test our JSON client component: Compilers receive a source file as input (our client component receives JSON) and produce text-like compiler diagnostics, notes, etc. as output (our server component outputs JSON).

Although lit and FileCheck are usually coupled to internal details of the LLVM project, it’s still possible to use them in other projects by installing the lit and FileCheck Python packages. Both of these are clones of the corresponding LLVM alternatives that implement the most common features. I’ll explain next how you can define an executable documentation format and configure your project to use lit and FileCheck.

Defining the Executable Documentation Format and Test Wrapper

As mentioned earlier, lit works by sending text to the standard input of an executable and checking that the response the executable writes to the standard output is what we expect. If you already have an executable in your project that works like this, you can use it directly with lit. If not, you should create a small wrapper over the existing API.

In this example, I’ll create an api-tester wrapper binary. But first, let’s see an example of the test format we’d want to use as documentation. The following snippet is the add_annotation.doctest file. It’ll cover a simple test scenario where we add an annotation to a PDF and then get the list of annotations to assert that it has been added correctly:

# `add_annotation`: Adds an annotation to a PDF document.

# RUN: %api-tester --pdf "Sample.pdf" < %s | %FileCheck %s

# For the full type declaration of "annotation," see `<Source_code_annotation_model>`.

{
  "type": "add_annotation",
  "annotation": {
    "bbox": [
      74,
      541,
      201,
      70
    ],
    "createdAt": "2018-03-07T17:59:40Z",
    "rotation": 0,
    "stampType": "Custom",
    "type": "pspdfkit/stamp",
    "updatedAt": "2018-03-07T17:59:40Z",
  }
}

#      CHECK:[
# CHECK-NEXT:    {
# CHECK-NEXT:        "annotation_id": 26
# CHECK-NEXT:    }
# CHECK-NEXT:]
---
# Now get the list of annotations and see that the new annotation has been added.

{"type": "get_annotations"}

#      CHECK:[
# CHECK-NEXT:    {
# CHECK-NEXT:        "bbox": [
# CHECK-NEXT:            74,
# CHECK-NEXT:            541,
# CHECK-NEXT:            201,
# CHECK-NEXT:            70
# CHECK-NEXT:        ],
# CHECK-NEXT:        "createdAt": "2018-03-07T17:59:40Z",
# CHECK-NEXT:        "rotation": 0,
# CHECK-NEXT:        "stampType": "Custom",
# CHECK-NEXT:        "type": "pspdfkit/stamp",
# CHECK-NEXT:        "updatedAt": "2018-03-07T17:59:40Z",
# CHECK-NEXT:    }
# CHECK-NEXT:]
---

Lines starting with # serve as comments, and they document what the add_annotation command is about. # RUN is the main lit entry point. In this example, it’ll run api-tester and pass Sample.pdf as PDF input, piping the output to FileCheck to check the results. %api-tester and %FileCheck are lit substitutions, and these placeholders will be replaced with the actual api-tester and FileCheck paths when the test is executed. # CHECK and # CHECK-NEXT are the FileCheck syntaxes to assert on the text results. You can get more information on the FileCheck documentation website. Multiple command executions are separated by ---. The next command is get_annotations, which will check if the annotation was added to the document successfully.

The implementation of api-tester will parse this particular executable documentation format we’ve designed, but don’t worry: We’ll leverage lit to simplify things. Here’s how the api-tester would be implemented, in pseudocode:

while (not end of test file) {
  message = readMessage()
  json = parseJSON(message)
  // Send JSON to our internal API.
  // Get the result from our internal API and convert it back to JSON.
  // Send the JSON result to the standard output.
}

And here’s how readMessage works, in pseudocode:

func readMessage() {
  for (each line) {
    if (line.startsWith('#')) continue
    if (line.equals('---')) break
    json += line
  }
  return json
}

The algorithm goes line by line, accumulating JSON input until it finds the mark that separates commands in a .doctest file (---).

Once we’ve defined how the .doctest file looks and created our api-tester wrapper, it’s time to integrate lit into our project structure.

Integrating lit into an Existing Project

The first thing we need to do to integrate lit is to add a new folder to our project that will contain lit tests. For example, we can name the folder integration-tests. Inside that folder, we’ll add a file named lit.site.cfg.py.in with the following content (it assumes the project is using CMake):

config.api_tester_source_dir = "@CMAKE_CURRENT_SOURCE_DIR@/.."
config.api_tester_binary_dir = "@CMAKE_CURRENT_BINARY_DIR@/../api-tester"
config.file_check_binary_dir = "@CMAKE_CURRENT_SOURCE_DIR@/../../bin/"

# Delegate logic to `lit.cfg.py`.
lit_config.load_config(config, "@CMAKE_CURRENT_SOURCE_DIR@/lit.cfg.py")

This file configures the path to the api-tester binary and the FileCheck wrapper. The next step is to add a lit.cfg.py file with the following content:

import lit.formats

config.name = api-tester'
config.test_format = lit.formats.ShTest(True)

config.suffixes = ['.doctest']

config.test_source_root = os.path.dirname(__file__)
config.test_exec_root = os.path.join(config.api_tester_binary_dir, 'test')

config.substitutions.append(('%api-tester',
    os.path.join(config.api_tester_binary_dir, 'api-tester')))
# Configure the path to the FileCheck tool.
config.substitutions.append(('%FileCheck', 'filecheck'))

This is the configuration file for the lit test suite. It defines our executable documentation extension (.doctest), which is the extension that will be used by our test files. It also configures a couple of lit substitutions, that is, %api-tester and %FileCheck, as we saw in the previous section where we covered how a sample .doctest file would look.

The only thing left is to create a build target that will run all the executable documentation tests easily, without having to invoke lit manually. The following snippet shows how to create such a target using CMake:

configure_file(lit.site.cfg.py.in lit.site.cfg.py @ONLY)

add_custom_target(check-api
  COMMAND ../../../bin/my-lit.py "${CMAKE_CURRENT_BINARY_DIR}" -v
  DEPENDS api-tester)

The my-lit.py file is a simple wrapper over lit that allows lit to be invoked as a standalone executable and not as a library:

#!/usr/bin/env python3
from lit.main import main
if __name__ == '__main__':
    main()

Supposing that you use Ninja as your build system, then running ninja check-api from a shell will run every .doctest in the project.

Conclusion

In this blog post, I explained why keeping documentation and implementation in sync is important in any software project, especially when the project is starting to scale in the number of contributors and code check-ins to the repository. The typical suggestion in these cases is to keep documentation and code as close as possible, for example, by keeping them in the same folder.

In this article, we went a step further and explored how we can adapt the integration tester used by the LLVM project (lit) to represent tests and code using a single file format we define ourselves: .doctest. The idea is to replace some existing coarse-grained tests and internal .md docs in a project with a single file format that is easy to read and that can be tested in continuous integration servers regularly. We’re currently exploring how to make this file format even more versatile and powerful so that it can be used in more situations internally.

In conclusion, by spending less time writing coarse-grained tests and documentation, project contributors can spend more time on other valuable things that also have an impact on the project’s quality.

Author

Daniel Martín Core Engineer

Daniel is part of the Core Team at PSPDFKit and has worked on multiple topics, ranging from cryptography and text systems, to file format support and JavaScript engines. Outside of work, he likes spending time with his family, football, reading books, and watching films.

Executable API Documentation with the LLVM Integration Tester and FileCheck

Keeping Documentation and Implementation in Sync Is Hard

Using Tests as Executable Documentation

Executable Documentation Using lit and FileCheck

Defining the Executable Documentation Format and Test Wrapper

Integrating lit into an Existing Project

Conclusion

Share Post

Related Articles

How to Extract Text from PDF in Python

How to Use Tesseract OCR in Python

How to Convert HTML to Image Using wkhtmltoimage and Python