WebAssembly: A New Hope

WebAssembly: A New Hope

In March 2017, just five months ago, the WebAssembly Community Group reached consensus on the initial (MVP) binary format, JavaScript API, and reference interpreter. This exciting new technology is already shipping in Chrome and Firefox, and will be fully supported in macOS High Sierra and iOS 11 with Microsoft Edge support following shortly thereafter. But what is WebAssembly and what does it mean for the web?

The dream of high-performance computing in a secured browser

A slide from Alon Zakai's presentation about Emscripten in 2013.

Four years ago, Alon Zakai from Mozilla developed a subset of JavaScript that aimed to bring extraordinary optimizations to the language. This marks the birth of asm.js. By using the the asm pragma ("use asm";) and nifty typing hints, asm.js allows JavaScript interpreters that support asm.js to use low-level CPU operations instead of more expensive JavaScript. If the interpreter does not offer this support then the code will still execute with identical results, albeit more slowly. This makes it possible to bypass a lot of the difficult to optimize routines like coercion and garbage collection in situations were we know we don't need it.

A primer of asm.js

For example, a | 0 is used to hint that a is a 32-bit integer or +a is a double (64-bit floating point). The former works because the spec defines bitwise operators to operate on a sequence of 32 bits. Those expressions have no side effects and can thus be inserted wherever it is required to hint the type, either arguments to a function call or its return value. In the following example, an asm.js optimized JavaScript interpreter might only execute a single 32-bit addition when the add export is called whereas an interpreter that doesn't support asm.js will have to execute many many more instructions in order to fully follow the ECMAScript specification because it has no advance knowledge of the types being passed to the function.

Copy
1
2
3
4
5
6
7
8
9
10
function AsmModule() {
  "use asm";
  return {
    add: function(a, b) {
      a = a | 0;
      b = b | 0;
      return (a + b) | 0;
    },
  }
}

To avoid expensive garbage collection routines memory management is delegated entirely to the application much like in a typical native application where the code must allocate and deallocate memory directly from the assigned part of RAM. To implement this a large memory buffer is allocated up front and then used throughout the asm.js block. The asm.js code creates typed views to access slices of that buffer and use it as memory.

Copy
1
2
3
4
5
var heap = new ArrayBuffer(0x100000); // 128kb
var pointer = 0x100;
var view = new Int32Array(heap, pointer, 0x100); // 256 bytes at offset 256
view[0] = 327;
view[1] = 1138;

In a block marked as "use asm";, all advanced JavaScript features can be deactivated until a violation occurs (for example a reference to an object is cleared).

asm.js code is not typically written by hand but rather the result of compiling code from another language. Typically C or C++. To create asm.js optimized code, Emscripten - a LLVM-to-JavaScript compiler tool was created. LLVM is a popular tool in many native development toolchains. LLVM defines an intermediate representation (LLVM IR) that sets out a low-level language similar to assembly. In native development steps this intermediate code can already be heavily optimized and easily be translated to the target architecture (the CPU architecture/instruction set where the code should be run on, like x86 or ARM). Emscripten can read this intermediate representation and translate it to asm.js with additional optimization steps in the middle.

Before asm.js code is generated, LLVM intermediate representation is generated.

While asm.js makes it possible to improve the execution speed significantly and allows low-level languages like C/C++ with no concept of garbage collection to compile to the web, it unfortunately comes with a few downsides:

  1. Type hints and the JavaScript syntax can result in very large asm.js files.
  2. It needs to be parsed like JavaScript which can be expensive on lower end devices like mobile phones.
  3. Since asm.js need to be valid JavaScript adding new features to it is very complex and affects JavaScript as well.
  4. Growing the initial heap at runtime is expensive since ArrayBuffers, which are used to store the heap, are immutable. To solve this, one must create a new, larger ArrayBuffer and copy the content from the first buffer into the second one. This operation results in an asm.js violation which is why Emscripten warns about disabled optimizations.

The new kid on the block

To solve all those issues the development of WebAssembly or wasm - "a new, portable, size- and load-time-efficient format suitable for compilation to the web" started in 2015. The first version currently being deployed to all major browsers (Chrome, Firefox, Safari, and Edge) is already a replacement for asm.js while other, more powerful features, such as threads, are planned as well. It's important to point out that it is designed to complement JavaScript not replace it and that in a browser context it has no direct access to the DOM, only via JavaScript.

WebAssembly support is rolling out this year in all major browsers. PSPDFKit for Web also supports older browsers by falling back to asm.js.

WebAssembly consists of four key concepts:

  1. Module - A compiled WebAssembly unit. Similar to an ES2015 module, a WebAssembly module declares imports and exports to the JavaScript language.
  2. Memory - A growable ArrayBuffer.
  3. Table - An array to store function references. This offers another way to access JavaScript functions inside WebAssembly since these functions can not directly be stored in the memory and called that way. Instead, the function will be stored in the table and can be called with its index. The table can be mutated by the host environment (JavaScript).
  4. Instance - A stateful, initialized Module, that's connected to a Memory and a Table object.

Additionally, WebAssembly defines a binary representation for the language code that is similar to LLVM IR and needs to be compiled to the host architecture before it can be used. Some implementations, such as Microsoft's Chakra, use a Just In Time strategy to compile to the native host architecture and some such as Google's Chrome compile the entire module up front. To avoid compilation every time a WebAssembly module is requested, the resulting module can also be persisted on a client using IndexedDB.

Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
var importObject = {
  imports: {
    imported_func: function(arg) {
      console.log(arg);
    }
  }
};

fetch("application.wasm").then(response =>
  response.arrayBuffer()
).then(bytes =>
  WebAssembly.instantiate(bytes, importObject)
).then(result =>
  result.instance.exports.exported_func()
);

With this design, WebAssembly solves all the issues we discussed with asm.js above:

  1. File size is a lot smaller because of the binary representation. For our product, the WebAssembly version is about half the size of the asm.js build (about one-third for a gzipped build).
  2. WebAssembly improves execution time by making multiple steps in the engine's pipeline faster. For example: parsing is greatly simplified, the code is already in an intermediate format and just needs to be validated. Lin Clark wrote an excellent article about the reasons WebAssembly is fast.
  3. WebAssembly can be improved with new features independent of JavaScript. A good example for this is the SIMD (Single Instruction/Multiple Data) extension to JavaScript. This CPU powered acceleration is a key optimization in modern assembly, but the impact on the JavaScript API is so big that plans to bring it there were recently been abandoned because of its complexity. Instead, this feature will be added to WebAssembly directly with no JavaScript API.
  4. WebAssembly's memory concept is based on the growable Memory class. This makes it possible to have more dynamic memory allocation.

WebAssembly is becoming the de-facto solution for bringing native code to the web. With support from all major browsers and the development of a new LLVM backend inside the LLVM master branch, we're looking at a bright future for the web.

WebAssembly at PSPDFKit

We recently released PSPDFKit for Web 2017.5, the first version of our web framework that supports standalone rendering, i.e. on the client without a daemon running on a server. To completely avoid a server component that can read the PDF, we worked hard to compile our 500.000 LOC C++ core to WebAssembly and asm.js. It's extremely important for us, that we can reuse the PDF rendering code across all our modern platforms, because PDF rendering is hard to get right. Our shared core gives us the same low level rendering and parsing of PDF documents everywhere and allows us to fully focus on one PDF engine.

When downloading the new PSPDFKit for Web, it now also includes four new artifacts that are available next to pspdfkit.js and pspdfkit.css:

Filename Description
pspdfkit.wasm This file contains the WebAssembly binary code.
pspdfkit.wasm.js A small wrapper around the WebAssembly module to create a unified API that's shared with asm.js.
pspdfkit.asm.js The asm.js build of our PDF backend.
pspdfkit.asm.js.mem A binary file that contains initial memory values for the asm.js build.

When the PDF viewer is initialized, we feature-test the presence of WebAssembly as well as some additional WebAssembly features to decide how we initialize the native module.

Talking about an exciting new technology is one thing - but we want you to experience what WebAssembly made possible at PSPDFKit. The following demo of our PDF framework will use WebAssembly when it's available or fall back to asm.js. While you were reading the article, we prepared everything. This is just an example of how easy it is, to integrate PSPDFKit.

PSPDFKit for Web using asm.js fallback

To give you a comparison of the rendering performance between native, WebAssembly, and asm.js, we did an extensive benchmark across various devices.

A comparison of PDF rendering performance across different devices when using native, asm.js and wasm.

While the results are already impressive, we also want to point out that WebAssembly is very new and improving at an insane rate. The new LLVM backend is still experimental, and while bringing PSPDFKit to WebAssembly we discovered a lot of edge cases that we could only solve with the help of browser vendors. At this point, we want to issue a special thank you to the WebAssembly teams at Mozilla and Google, especially Alon Zakai, for being so helpful. We did run into a few edge cases but with their help we were able to still make it happen and even improved the Emscripten tool chain a bit along the way.

While we're very optimistic about the current state of WebAssembly, we know that less capable systems still struggle with expensive rendering operations. For those cases, we recommend that you check out our Server-backed product which already enables blazing fast PDF rendering even on lower-end devices. In the future, we also want Server-backed installations to make use of WebAssembly and will enable progressive streaming of client side rendered PDF documents on-demand. We believe, that a combination of server- and client-side technologies will offer the best experience for displaying PDF documents on the web and we're working hard to make this as seamless as possible.


Filed under: Web, WebAssembly

Try PSPDFKit for Web Today

PDF viewing, annotating and collaboration for web apps.

Try Now