Key-Value Pair Extraction Confidence Score

Information

PSPDFKit Processor has been deprecated and replaced by PSPDFKit Document Engine. All PSPDFKit Processor licenses will work as before and be supported until 15 May 2024 (we will contact you about license migration). To start using Document Engine, refer to the migration guide. With Document Engine, you’ll have access to robust new capabilities (read the blog for more information).

PSPDFKit’s key-value pair (KVP) extraction engine calculates a confidence score that expresses how confident the engine is in the accuracy of the extracted data.

The confidence score is calculated by considering the following factors, among others:

  • The confidence in the optical character recognition (OCR) result at the character level. Some characters are more difficult to recognize than others.

  • The confidence in the OCR result at the word level. Some words are more difficult to recognize than others.

  • The data type of the key. Some data types are more difficult to recognize than others. For example, dates and IBANs are relatively easy to recognize, while phone numbers and addresses are generally more difficult.

The confidence score enables you to filter results based on their assumed accuracy. For example, you can disregard data extraction results with a low confidence score or flag them as data items that require manual checks.