Extract Data from Bank Statements
This guide explains how to extract key-value pairs (KVPs) from bank statements using PSPDFKit Document Engine. For example, this enables you to extract IBANs or account numbers. For more information, refer to the guide on how key-value pair extraction works.
Sending the Request to Extract Data
To extract key-value pairs from a bank statement, post a multipart request to the /api/build
endpoint. In the instructions, specify the following output parameters:
-
type
specifies the output type. Set this tojson-content
. -
keyValuePairs
is a Boolean value that determines whether to extract key-value pairs. -
language
specifies the language used for recognizing text with optical character recognition (OCR). Sometimes, text is stored in a PDF or an image in a way that makes it so you cannot search or copy it. PSPDFKit’s OCR engine allows you to recognize text and save it in a separate file where you can both search and copy and paste the text. For more information, refer to the list of supported languages.
curl -X POST http://localhost:5000/api/build \ -H "Authorization: Token token=<API token>" \ -F document=@/path/to/example-document.pdf \ -F instructions='{ "parts": [ { "file": "document" } ], "output": { "type": "json-content", "keyValuePairs": true, "language": "english" } }' \ -o result.pdf
POST /api/build HTTP/1.1 Content-Type: multipart/form-data; boundary=customboundary Authorization: Token token=<API token> --customboundary Content-Disposition: form-data; name="document"; filename="example-document.pdf" Content-Type: application/pdf <PDF data> --customboundary Content-Disposition: form-data; name="instructions" Content-Type: application/json { "parts": [ { "file": "document" } ], "output": { "type": "json-content", "keyValuePairs": true, "language": "english" } } --customboundary--
For more information on the Build instructions, refer to the API Reference.
Example Data Extraction Response
{ "pages": [ { "pageIndex": 0, "keyValuePairs": [ { "confidence": 95.4, "key": { "bbox": { "left": 0, "top": 0, "width": 100, "height": 100 }, "content": "IBAN" }, "value": { "bbox": { "left": 0, "top": 0, "width": 100, "height": 100 }, "content": "FR7611808009101234567890147", "dataType": "String" } } ] } ] }