The Complexities of Rendering PDF Documents
PSPDFKit renders documents pixel perfect. We use a custom renderer based on C++ that is shared across all our platforms. This guarantees the broadest test coverage and allows us to focus on a highly tuned codebase for all supported platforms. If you have a document that renders incorrectly or different than with Adobe Acrobat, please report it to us.
To learn about our rendering API, please see the Rendering PDF Pages guide.
Understanding the Complexities
PDF has been around since 1993 and has evolved a lot since then. This also means that there’s a lot of legacy baggage in the specification — and one of the main promises of PDF is that even your 20-year-old documents will still render as when they were first created.
The specification, currently at version 1.7, has 756 pages. There are also quite a few extensions required (Adobe® Supplement to ISO 32000-1, Extension Level 3, Adobe® Supplement to ISO 32000-1, Extension Level 5) to parse more modern documents correctly. Additionally, there’s an amendment for Adobe Acrobat-specific quirks, and basically everyone else has to follow this to be correct. There’s the older original 1.7 version of the document from November 2006, which includes many details — missing in the most current version of the spec — about correctly supporting older documents (1,310 pages) and the Errata and Redaction amendments. (This page provides a good overview.)
Along with the official specification, there’s also “real-world” PDF, which often has bugs based on misunderstandings or bugs in creation software that need specific workarounds and hacks. Adobe Acrobat is very forgiving in terms of invalid references, duplicate entries, and typos, and it allows various drawing command variations that aren’t documented anywhere — these are things that have to be found out via trial and error, and which also add to the complexity of a rendering engine. Our code around this engine is mostly written in modern C++; however, the codebase has to deal with many variants and edge cases, so the code is quite large and complex, resulting in a noticeable binary footprint.
PDF also supports many image formats that require additional libraries to decode — such as JPEG 2000 or JBIG2 — which contributes to the binary size.