The State of Debugging in WebAssembly

Illustration: The State of Debugging in WebAssembly

If you’ve read the PSPDFKit blog before, you’ve probably noticed a bundle of WebAssembly blog posts. And as WebAssembly progresses, we’re still along for the ride with the relatively new technology.

Because we’ve been shipping WebAssembly binaries in our SDKs for three years, we’ve also hit bugs that are often hard to track down and outright mind boggling. In the past, it’s only been made worse by the lack of debugging and profiling tools.

In this blog post, I’ll take you through some techniques we use to debug our WebAssembly code — some old, some new — and then share our hopes for the future with some interesting new projects.

For more blog posts on WebAssembly, check out the following:

Debugging in 2020

Our lives as developers have become enriched with a slew of debugging tools (unless you’re working on some antiquated embedded system from years gone by). Whether you’re using a step-by-step debugger, compile time linters and checks, or runtime sanitizers, bugs have never been easier to catch before the code goes out the door. Sadly, this doesn’t mean it won’t happen.

Because of how new WebAssembly is (at least in comparison to C++), many of the aforementioned tools are in their infancy, or they don’t exist at all. And in the past, sometimes we’ve been forced to revert to the old school printf to track down bugs. With the slower compile times and current deployment options, this can make your day a little painful.

What PSPDFKit Uses

Since the launch of PSPDFKit for Web, we’ve been using Emscripten, a compiler created by Alon Zakai, who is a developer currently working at Google.

Emscripten allows us to compile 500,000+ lines of C and C++ with only minor modifications and to call our PDF framework directly from the JavaScript running in the browser. This is something that would have been unthinkable just five years ago. Today’s technologies are amazing!

Within the last year, Emscripten made the move from the original Fastcomp backend used for compilation to plain old LLVM, which drives many compilers these days.

The move to LLVM allowed for the use of more debugging tools that can come in handy, some of which we’ll explore later.

Emscripten Debugging Options

In this blog post, I’ll only introduce runtime debugging options, but there are many compilation time debug options that can be explored too.

printf

I’ve already mentioned it, but sometimes you just can’t avoid using printf. And let’s be honest, it can be a good option for getting to know an area of code and operations. But when it’s your only tool, it can feel like a curse.

By setting up a logging system as we do at PSPDFKit, we’re able to send logs with relevant information directly to the browser console at runtime. This can help drastically when trying to understand the order of operations without the use of a debugger.

We achieve our logging with a number of tricks, all helped by Emscripten. In this example, I’ll show how we send logging to the console. First, we create a global object in JavaScript that holds logging functions — such as error, debug, and info — to abstract the browser console logging:

Copy
1
2
3
4
5
6
7
8
9
10
11
global.NativeLogging = {
  error(tag, line) {
    console.error(`[${tag}] ${line}`);
  },
  info(tag, line) {
    console.log(`[${tag}] ${line}`);
  },
  debug(tag, line) {
    console.log(`[${tag}] ${line}`);
  }
};

We can then retrieve this object in C++ with the use of Emscripten’s val::global binding:

1
auto jsNativeLogging = val::global("NativeLogging");

Using Emscripten’s embinded functions, we can now call JavaScript functions in NativeLogging to direct the logging to the correct location:

1
getJSLoggingServices().call<void>("error", tag, line);

Compilation Options

If you’ve compiled or worked with C++ before, you probably already know you can add the -g flag to produce an output with debug symbols. These debug symbols are used when running with a debugger attached to step through code and query variables at runtime (more on that later).

Emscripten provides a range of debug options, which you can pass with the linker flags, ranging from -g0 to -g4. These flags are not only related to debug symbols, but they also control other aspects, such as changing how JavaScript is generated and whether to keep function names in the code.

Here’s what the Emscripten documentation currently says:

-g0: Make no effort to keep code debuggable.
-g1: When linking, preserve whitespace in JavaScript.
-g2: When linking, preserve function names in compiled code.
-g3: When compiling to object files, keep debug info, including JS whitespace, function names, and LLVM debug info if any (this is the same as -g).
-g4: When linking, generate a source map using LLVM debug information (which must be present in object files, i.e., they should have been compiled with -g).

Most of the time when you’re debugging, you’re going to want at least -g3 (-g does the same), which will allow you to step through JavaScript code in the debugger and get some understanding of how the compiled code is being called.

In addition to the debug flags, there are other options, such as ASSERTIONS, which prints more error information at runtime; DEMANGLE_SUPPORT, which prints human-readable function names in stack traces; and SAFE_HEAP which performs memory access checks.

Although these options will slow down compilation and runtime, the extra information is well worth the wait when tracking down a bug.

LLVM Sanitizers

At PSPDFKit, we make good use of sanitizers when testing our code. We see them as a low-cost way of catching memory and undefined behavior issues before the code hits production.

Last year, Emscripten gained support for running sanitizers on WebAssembly code. If you suspect some type of memory issue, like a use after free (UAF) issue or a lower-level buffer overflow, the address sanitizer can be your friend. Just compile your code as normal, add fsanitizer=address to the compiler flags and linker flags, and run the same code that gave you an issue. You’ll get a printout of the suspected error, and it tells you where memory was allocated, freed, and read (if relevant). This information can guide you to memory issues far quicker than previous debugging techniques.

Note that the runtime will use much more memory to keep track of allocations and deallocations, so it can be a good idea to debug with ALLOW_MEMORY_GROWTH to negate any memory limitations.

The other sanitizer to talk about is an undefined behavior sanitizer. If you’ve played with C++ for some time, you probably know that undefined behavior can sneak up on you when you least expect it. You’ve probably run into integer truncation when converting a long to a short at some point, and that doesn’t always cause an issue, but that time it does, who knows what will happen?

Again, just pass -fsanitize=undefined with the compiler flags and the linker flags, and the sanitizer will perform runtime checks for you.

Using the Chrome Developer Tools Debugger

This is the holy grail of debugging options! That said, it currently has some caveats. For a long while, it wasn’t possible to effectively debug native code in the browser, but late last year, the Chrome team announced its improved support for WebAssembly debugging.

This means you can now execute code in the binary and step through the mapped C/C++ code as it’s running natively!

We tested this out on the large PSPDFKit codebase and it seems almost odd to see a debugger stepping between JavaScript and C++.

But as I said, it doesn’t come without its downfalls: Currently it’s not possible to resolve variable names, and the variables you do see are very simplistic, meaning you’ll see a simple unsigned integer. But on the plus side, you can step through the program to understand the flow of execution. It’s not LLDB, but it’s something.

To perform debugging with Chrome Developer Tools, you’ll have to set up a few things:

  • Pass the g4 linker flag to produce the source map file.
  • Pass the linker flag --source-map-base http://localhost:8080/ with the location where the source and map will be served from.
  • Ensure your server serves all the files used to compile the binary from the URL given to source-map-base.
  • Fire up Chrome and open Developer Tools > Sources and navigate to the native sources being served.
  • Set a breakpoint at the desired location.

What Does the Future Look Like?

There’s hope that native debugging will come from somewhere outside the browser with the introduction of The WebAssembly System Interface (WASI). The focus of this project is to create a standard interface mapping to OS features such as file systems, clocks, and other things you need.

The theory is to take the web element out of WebAssembly and allow WebAssembly binaries to be run directly on machines without the use of JavaScript and JavaScript engines.

This is exciting, because it takes a layer of complexity out of the system, allowing for more feature-rich debugging capabilities to be built — for example, the ability to use LLDB directly on WebAssembly-compiled code.

Whereas it’s more or less possible to compile and debug code to run on Wasmtime (Mozilla’s WASI-compliant runtime), there are many holes that remain. The project is still experimental and some features have yet to be fully recognized.

For the most up-to-date information, it’s best to check out the WASI repository.

PSPDFKit for Web

PDF viewing, annotating, and collaboration for web apps.

Try Now