OCR

You can perform OCR (optical character recognition) on any document with PSPDFKit for Web.

Information

OCR is available when using the Web SDK with Document Engine in server-backed operational mode.

To do so, open the document from Document Engine and apply the performOcr document operation with Instance.applyOperations:

await instance.applyOperations([
  { type: "performOcr", language: "english", pageIndexes: "all" }
]);

This will detect all English text in the document and make it available for searching and manual text selection.

Other Languages

If your document is written in a language other than English, you can extract its text by modifying the language parameter. For example, to perform OCR in Spanish, run:

await instance.applyOperations([
  { type: "performOcr", language: "spanish", pageIndexes: "all" }
]);

PSPDFKit for Web can perform OCR in the following languages:

  • Croatian

  • Czech

  • Danish

  • Dutch

  • English

  • Finnish

  • French

  • German

  • Indonesian

  • Italian

  • Malay

  • Norwegian

  • Polish

  • Portuguese

  • Serbian

  • Slovak

  • Slovenian

  • Spanish

  • Swedish

  • Turkish

  • Welsh