Rendering PDF Documents

PSPDFKit renders documents pixel-perfect. We use a custom C++ - based renderer that is shared across all platforms we use. This guarantees the broadest test coverage and allows us to focus on a highly tuned codebase for all supported platforms. If you have a document that renders incorrectly or different than with Adobe Acrobat, please report a bug.

The Complexities of Rendering PDF

PDF has been around since 1993 and evolved a lot since then. This also means that there is a lot of legacy baggage in the specification - and one of the main promises of PDF is that even your 20-year old documents will still render as they were, pixel-perfect. The specification, currently at version 1.7, has 756 pages.. There are also quite a few extensions required (Adobe® Supplement to ISO 32000-1, Extension Level 3 Adobe® Supplement to ISO 32000-1, Extension Level 5) to parse more modern documents correctly. There is also an amendment for Adobe Acrobat-specific quirks, which basically everyone else has to follow to be correct. There's the older original 1.7 version of the document from November 2006 which includes many details missing in the most current version of the spec to correctly support older documents (1310 pages) and the Errata and Redaction amendments. (This page is a good overview). PDF also has a JavaScript API that is most often used to make forms interactive and validate input, but is critical to be supported as it is also often used for things like turning pages or opening a hyperlink. The specification is another whopping 769 pages. The specification for character maps, required to convert PDF glyph data into unicode so they can be searched, is in another separate document with over 100 pages. And while PSPDFKit does not support XFA, some information about handling mixed mode documents with both AcroForms and XFA is part of the XML Forms Architecture (XFA) Specification, which is a document with about 1600 pages. (To fully understand the details sometimes we also need older versions of this document, this page is a good overview) Since PDF is based on PostScript, the documentation often references the PostScript language reference (1999), which has about 900 pages (and the Errata from 2004). There are various further documents required for certain encodings used, but these are rarely over 100 pages.

Next to the official specification, there's also "real world" PDF which often has bugs based on misunderstandings or bugs in creation software that need specific workarounds and hacks. Adobe Acrobat is very forgiving in terms of invalid references, duplicate entries, typos and allows various drawing command variations that are nowhere documented - these are things that have to be found out via trial and error and also add to the complexity of a rendering engine. Our code around this engine is mostly written in modern C++, however the code base has to deal with many variants and edge cases so the code is quite large and complex - resulting in a noticeable binary footprint.

PDF also supports many image formats that require additional libraries to decode, like JPEG 2000 or JBIG2, also contributing to the binary size.

Was this page helpful? We're happy to answer any questions.