PDFs usually render pretty quickly. But sometimes, they really start taxing your hardware and it takes a while for them to show up. In this blog post, I’ll briefly cover how rendering a PDF works and then go into some of the reasons why this process might be slow.
How Does Rendering a PDF Work?
Rendering a PDF is like executing a rendering programming language. Each page in the PDF contains one or more streams of (usually) compressed data that instructs reader applications what to render at each point on the page.
Now that you know how this works, let’s look at a handful of scenarios where this is slow.
Some PDFs contain really high-resolution images. Before they can be rendered, they have to be loaded from the PDF and decoded into memory. Depending on how big they are, this can take up to tens of hundreds of megabytes. In environments where memory size is constrained, like on mobile platforms, this can cause problems.
Images can also be encoded in various formats that may be quicker or slower. In our experience, JPEG2000 pictures are usually the slowest.
Lots of Path Operations
Another big cause of slowdown is that of vector path operations. Imagine a PDF floorplan made up of hundreds of thousands of little lines. Each of these lines needs to be read from the content stream and rendered on the screen.
When using vector graphics, the expectation is that they render pixel-perfect and don’t start pixelating. This means we can’t render them once and cache them — we have to render them for each zoom level we encounter.
Broken but Recoverable PDFs
Sometimes PDFs are broken. Lots of PDF software, including PSPDFKit, has support for recovering broken PDFs. One issue that can cause severe performance issues is if the cross-reference table gets damaged. The cross-reference table is used to quickly access objects in the PDF document. Without it, we’d have to look through an entire file to find objects like pages or images. This would be very slow in large files.
If the table is damaged, we go through the whole file once and remember the byte offset of each object so we can look it up afterward. While this is usually a quick process with small PDFs, if your PDF file is a hundred megabytes or larger, it means we have to read through the complete file. This is still better than having to do the same each time we need to read an object.
PDFs can contain a lookup table — called Named Destinations — that allows you to link to different parts of a document. This provides a way to quickly and efficiently look up named destinations by grouping the names in specific sections. This way, you don’t have to go through the whole table but can just go through the section you need.
However, these lookup tables are sometimes broken. They have to be laid out in a PDF in a specific way, and if they aren’t, names might end up in the wrong part, which can slow everything down.
When you click on a link that has a named destination as a target, PSPDFKit has to go through the lookup table and find the proper destination. If we can’t find the named destination, it might mean the link is invalid. Alternatively, it could mean that the named destination table was set up incorrectly, and instead of being able to look things up quickly, we have to fall back to going through the whole table (which can be massive).
We do this because our first priority is to find the correct destination, and we’ve encountered many documents with incorrect tables and customers still need the links to go to the correct destination.
This post outlines some of the biggest problems when it comes to rendering performance, but there’s no need to worry! We at PSPDFKit always try our best to mitigate these issues by using lots of tricks we’ve learned over the years.