Key-Value Pair Extraction Confidence Score

PSPDFKit’s key-value pair (KVP) extraction engine calculates a confidence score that expresses how confident the engine is in the accuracy of the extracted data.

The confidence score is calculated by considering the following factors, among others:

  • The confidence in the optical character recognition (OCR) result at the character level. Some characters are more difficult to recognize than others.

  • The confidence in the OCR result at the word level. Some words are more difficult to recognize than others.

  • The data type of the key. Some data types are more difficult to recognize than others. For example, dates and IBANs are relatively easy to recognize, while phone numbers and addresses are generally more difficult.

The confidence score enables you to filter results based on their assumed accuracy. For example, you can disregard data extraction results with a low confidence score or flag them as data items that require manual checks.