2023.1 Migration Guide

PSPDFKit for Web 2023.1 introduces several larger changes. To determine if you need to take action, check your implementation and the information below.

New OCR and Office Conversion Engines

PSPDFKit Server is now shipping with brand-new OCR and Office conversion engines, which are based on GdPicture.NET.

Our previous OCR engine was based on the Tesseract open source project, and we used LibreOffice as the core of our Office conversion tools. This allowed us to produce quality results, but we found it lacking in certain aspects due to the nature of these two fundamental dependencies that were powering it. The main issue with our OCR engine was the performance, which was only acceptable at best. In the case of Office conversion, our main pain point was that we were unable to effectively improve the conversion quality itself.

Both new engines bring improved performance and accuracy, with documents being processed more quickly and accurately. The OCR performance gain is especially considerable: We measured improved performance of up to 7× when compared to the previous engine — all while delivering the same or sometimes even better accuracy.

Usage of these new engines requires a license update. If your license already includes OCR or Office conversion components, you qualify to get access to the updated engines for free. In that case, you’ll need to update your activated license to enable them.

In case you encounter any changes in behavior or regressions that would break your workflow, you can revert back to the old engines via following Server’s configuration options:

  • OCR_ENGINE — The OCR engine defaults to gdpicture. Use core to revert to the old engine.

  • CONVERSION_ENGINE — The Office conversion engine defaults to gdpicture. Use libreoffice to revert to the old LibreOffice-based Office conversion engine.

Information

Usage of old OCR and conversion engines is deprecated, and we’ll drop support for them in a future version. Please submit any issues you encounter with new engines to Support.

Instant JSON Changes

The Instant JSON schema has been updated to provide rich text support for annotations and comments:

  • The version of the annotations and comments JSON is now 2. In other words, instead of using "v": 1, the annotations and comments JSON now uses "v": 2.

  • The text property of text annotations (pspdfkit/text), note annotations (pspdfkit/note), and comments (pspdfkit/comment) is now an object with the following signature:

declare type Text = {
  // Format of the text value. Defaults to 'plain'.
  format?: "xhtml" | "plain",
  // Value itself.
  value: string
};

Before

// Example of an Instant JSON schema for a text annotation:
{
  "v": 1,
  "pageIndex": 1,
  "bbox": [150, 275, 120, 70],
  "opacity": 1,
  "pdfObjectId": 200,
  "creatorName": "John Doe",
  "createdAt": "2012-04-23T18:25:43.511Z",
  "updatedAt": "2012-04-23T18:28:05.100Z",
  "id": "01F46S31WM8Q46MP3T0BAJ0F85",
  "name": "01F46S31WM8Q46MP3T0BAJ0F85",
  "type": "pspdfkit/text",
  "text": "Content for a text annotation",
  "fontSize": 14,
  "fontStyle": ["bold"],
  "fontColor": "#000000",
  "horizontalAlign": "left",
  "verticalAlign": "center",
  "rotation": 0
}

After

// Example of an Instant JSON schema for a text annotation:
{
  "v": 2,
  "pageIndex": 1,
  "bbox": [150, 275, 120, 70],
  "opacity": 1,
  "pdfObjectId": 200,
  "creatorName": "John Doe",
  "createdAt": "2012-04-23T18:25:43.511Z",
  "updatedAt": "2012-04-23T18:28:05.100Z",
  "id": "01F46S31WM8Q46MP3T0BAJ0F85",
  "name": "01F46S31WM8Q46MP3T0BAJ0F85",
  "type": "pspdfkit/text",
  "text": {"value": "Content for a text annotation", "format": "plain"},
  "fontSize": 14,
  "fontStyle": ["bold"],
  "fontColor": "#000000",
  "horizontalAlign": "left",
  "verticalAlign": "center",
  "rotation": 0
}

Postgres 15 Support

This version brings compatibility with PostgreSQL 15.

For upgrade instructions, please refer to either the official PostgreSQL upgrade guide if you’re running the database on-premises, or your cloud provider’s documentation.

Database Migrations

This release includes two database migrations. Always upgrade PSPDFKit Server one node at a time, but this is especially important to do when the upgrade includes database migrations.

First, the migration adds a new table called secrets that’s used as a data storage for the secrets rotation API. Adding the new table doesn’t affect existing data, so you shouldn’t experience performance issues because of it.

Second, the migration adds a new column to the records tables. This will temporarily block any reads and writes to these tables. Adding columns to existing tables is usually a quick process, so you shouldn’t experience performance issues because of it. In our tests on tables with 1.5 million rows, the migration held a lock on the table for about 2.5 ms.

Each migration is run inside a transaction, so if it fails or you interrupt it, you can safely retry it.

If you experience any problems with migration during the upgrade, please submit a support request.

Other Changes

This release includes many more improvements. For a full list of changes, check out the Server changelog.

Migrate PSPDFKit for Web

For more information, see PSPDFKit for Web 2023.1 Migration Guide.