How to Use the OCR API

Information

PSPDFKit Server has been deprecated and replaced by PSPDFKit Document Engine. All PSPDFKit Server and PSPDFKit for Web Server-Backed licenses will work as before and be supported until 15 May 2024 (we will contact you about license migration). To start using Document Engine, refer to the migration guide. With Document Engine, you’ll have access to robust new capabilities (read the blog for more information).

This guide provides an overview of the OCR API and how to use it. For information on what OCR can do, please look here.

API Overview

PSPDFKit Server allows you to perform OCR using the performOcr document operation. This can be either applied directly on upload or used with existing documents.

Running OCR on Upload

You can run OCR when uploading your document by providing performOcr inside the operations parameter. For more information on running operations on document upload, see here:

POST /api/documents
Content-Type: multipart/form-data; boundary=customboundary
Authorization: Token token="<secret token>"

--customboundary
Content-Disposition: form-data; name="operations"

{
    operations: [
        {
            type: "performOcr",
            language: "english",
            pageIndexes: [0],
        },
    ],
}
--customboundary
Content-Disposition: form-data; name="file"; filename="Example Document.pdf"
Content-Type: application/pdf

<PDF data>
--customboundary--

Applying OCR to Existing Documents

You can also run OCR on documents you have already uploaded by using the apply_operations endpoint:

POST /api/documents/:document_id/apply_operations
Content-Type: application/json
Authorization: Token token="<secret token>"

{
  "operations": [
    {
        type: "performOcr",
        language: "english",
        pageIndexes: [0],
    },
  ]
}

Performance Considerations

Running OCR is a CPU-bound single-threaded operation. That means performing many parallel OCR operations on a single PSPDFKit Server instance can cause a high load for extended periods of time. We did some performance testing using our development hardware (2.4 GHz 8-core Intel Core i9 9980HK, 32 GB RAM, running a single OCR operation at a time), which should give you an idea of what kinds of speed you can expect given your server infrastructure:

  • Running OCR on a 6-page document: ~35–40 seconds to run OCR on the entire document, ~6–11 seconds to run OCR on a single page.

  • Running OCR on a 1-page document: ~3–4 seconds to run OCR on the page.

Things that affect how fast OCR will be performed:

  • The amount of pages in the document.

  • The amount of pages OCR will be performed on.

  • The content of the pages OCR will be performed on.

  • The single-threaded performance of your server hardware.