OCR

You can perform OCR (optical character recognition) on any document with PSPDFKit for Web.

OCR is available when using the Web SDK with Document Engine in server-backed operational mode.

To do so, open the document from Document Engine and apply the performOcr document operation with Instance.applyOperations:

await instance.applyOperations([
  { type: "performOcr", language: "english", pageIndexes: "all" }
]);

This will detect all English text in the document and make it available for searching and manual text selection.

Other Languages

If your document is written in a language other than English, you can extract its text by modifying the language parameter. For example, to perform OCR in Spanish, run:

await instance.applyOperations([
  { type: "performOcr", language: "spanish", pageIndexes: "all" }
]);

PSPDFKit for Web can perform OCR in the following languages:

Croatian
Czech
Danish
Dutch
English
Finnish
French
German
Indonesian
Italian
Malay
Norwegian
Polish
Portuguese
Serbian
Slovak
Slovenian
Spanish
Swedish
Turkish
Welsh