Blog Post

How to OCR PDFs with PSPDFKit API

Jonathan D. Rhyne
Illustration: How to OCR PDFs with PSPDFKit API

In this post, you’ll learn how to OCR PDF files using our PDF OCR API. With our API, you receive 100 credits with the free plan. Different operations on a document consume different amounts of credits, so the number of PDF documents you can generate may vary. You’ll just need to create a free account to get access to your API key.

Whether you’re working with large batches of scanned PDF files or automating your document workflows, PSPDFKit provides a reliable and scalable solution. This guide will show you how to OCR PDF documents using Python, PHP, JavaScript, and Java.

By the end of this post, you’ll know how to use the PSPDFKit OCR API to automate OCR in your preferred programming language and integrate it into your document-centric applications.

What is OCR PDF?

OCR (Optical Character Recognition) is the process of converting scanned documents and images containing text into searchable and editable PDFs. When you OCR a PDF, it extracts text data from the image or scanned file, enabling full-text search, copy-pasting of text, and enhanced accessibility.

With the PSPDFKit API, you can automate the OCR process to convert scanned documents to PDFs. This is useful in many industries like legal, healthcare, finance, and education where digital document management is crucial.

PSPDFKit OCR API Overview

The PSPDFKit OCR API is one of our 30+ PDF API tools. You can combine our OCR tool with other tools like merging, splitting, or redacting to create complex document workflows. Once you create your account, you’ll be able to access all our PDF API tools.

Creating a Free PSPDFKit Account

Go to our website and create your free account. You’ll see the page below.

Free account PSPDFKit API

Once you’ve created your account, navigate to Pricing and Plans to see your current plan.

Free plan PSPDFKit API

As you can see in the bottom-left corner, you’ll be given 100 free credits.

Obtaining the API Key

After you’ve verified your email, you can get your API key from the dashboard. In the menu on the left, select API Keys. You’ll see the following.

OCR PDFs Python API Key

Copy the Live API Key, because you’ll need this for the OCR PDF API.

Regardless of which language you’re using, the steps above will be the same, so ensure you’ve followed them before proceeding.

How to OCR a PDF in Python

To get started:

  • Create a free account to access your live API key.

  • Install Python on your system. You can download Python here.

Setting Up Files and Folders

First, make sure you have all the dependencies set up:

python -m pip install requests

Now, create a folder called ocr_pdf and open it in a code editor. For this tutorial, you’ll use VS Code as your primary code editor. Next, create two folders inside ocr_pdf and name them input_documents and processed_documents.

Then, in the root folder, ocr_pdf, create a file called processor.py. This is the file where you’ll keep your code.

Writing the Code

Open the processor.py file and paste in the code below:

import requests
import json

instructions = {
  'parts': [
    {
      'file': 'scanned'
    }
  ],
  'actions': [
    {
      'type': 'ocr',
      'language': 'english'
    }
  ]
}

response = requests.request(
  'POST',
  'https://api.pspdfkit.com/build',
  headers = {
    'Authorization': 'Bearer pdf_live_<rest_of_your_api_key>'
  },
  files = {
    'scanned': open('input_documents/document.pdf', 'rb')
  },
  data = {
    'instructions': json.dumps(instructions)
  },
  stream = True
)

if response.ok:
  with open('processed_documents/result.pdf', 'wb') as fd:
    for chunk in response.iter_content(chunk_size=8096):
      fd.write(chunk)
else:
  print(response.text)
  exit()

Code Explanation

First, you’re importing all the requests and json dependencies. After that, you’re creating the instructions for the API call. You’re then using the requests module to make the API call, and if the response is successful, you’re storing the result in the processed_documents folder.

Information

Make sure to replace pdf_live_<rest_of_your_api_key> with your API key.

Output

To execute the code, use the command below:

python3 processor.py

Once the code has been successfully executed, you’ll see a new file in the processed_documents folder called result.pdf.

How to OCR a PDF in PHP

To get started:

  • Create a free account to access your live API key.

  • Install PHP on your system. You can download PHP here.

Setting Up Files and Folders

Now, create a folder called ocr_pdf and open it in a code editor. For this tutorial, you’ll use VS Code as your primary code editor. Next, create two folders inside ocr_pdf and name them input_documents and processed_documents. Now copy your PDF file to the input_documents folder and rename it to document.pdf.

Then, in the root folder, ocr_pdf, create a file called processor.php. This is the file where you’ll keep your code.

Writing the Code

Open the processor.php file and paste in the code below:

<?php

$FileHandle = fopen('processed_documents/result.pdf', 'w+');

$curl = curl_init();

$instructions = '{
  "parts": [
    {
      "file": "scanned"
    }
  ],
  "actions": [
    {
      "type": "ocr",
      "language": "english"
    }
  ]
}';

curl_setopt_array($curl, array(
  CURLOPT_URL => 'https://api.pspdfkit.com/build',
  CURLOPT_CUSTOMREQUEST => 'POST',
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => '',
  CURLOPT_POSTFIELDS => array(
    'instructions' => $instructions,
    'scanned' => new CURLFILE('input_documents/document.pdf')
  ),
  CURLOPT_HTTPHEADER => array(
    'Authorization: Bearer pdf_live_<rest_of_your_api_key>'
  ),
  CURLOPT_FILE => $FileHandle,
));

$response = curl_exec($curl);

curl_close($curl);

fclose($FileHandle);

Code Explanation

You’re first creating a FileHandle variable because this will help you save the file in the processed_documents folder. After that, you’re creating the instruction variable where all the instructions for the API are stored in the form of a JSON string. And finally, in the last part of the code, you’re making a CURL request.

Information

Make sure to replace pdf_live_<rest_of_your_api_key> with your API key.

Output

To execute the code, run the following command:

php processor.php

On successful execution, it’ll create a file in the processed_documents folder called result.pdf.

How to OCR a PDF in JavaScript

To get started, install:

  • axios — This package is used for making REST API calls.

  • Form-Data — This package is used for creating form data.

You can use the following command to install both of them:

npm install axios
npm install form-data

Setting Up Files and Folders

Now, create a folder called ocr_pdf and open it in a code editor. For this tutorial, you’ll use VS Code as your primary code editor. Next, create two folders inside ocr_pdf and name them input_documents and processed_documents. Now copy your PDF file to the input_documents folder and rename it to document.pdf.

Then, in the root folder, ocr_pdf, create a file called processor.js. This is the file where you’ll keep your code.

Writing the Code

Open the processor.js file and paste in the code below:

const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');

const formData = new FormData();
formData.append(
	'instructions',
	JSON.stringify({
		parts: [
			{
				file: 'scanned',
			},
		],
		actions: [
			{
				type: 'ocr',
				language: 'english',
			},
		],
	}),
);
formData.append(
	'scanned',
	fs.createReadStream('input_documents/document.pdf'),
);
(async () => {
	try {
		const response = await axios.post(
			'https://api.pspdfkit.com/build',
			formData,
			{
				headers: formData.getHeaders({
					Authorization: 'Bearer pdf_live_<rest_of_your_api_key>',
				}),
				responseType: 'stream',
			},
		);

		response.data.pipe(
			fs.createWriteStream('processed_documents/result.pdf'),
		);
	} catch (e) {
		const errorString = await streamToString(e.response.data);
		console.log(errorString);
	}
})();

function streamToString(stream) {
	const chunks = [];
	return new Promise((resolve, reject) => {
		stream.on('data', (chunk) => chunks.push(Buffer.from(chunk)));
		stream.on('error', (err) => reject(err));
		stream.on('end', () =>
			resolve(Buffer.concat(chunks).toString('utf8')),
		);
	});
}

Code Explanation

Here you have a form variable that contains the instructions for the API. You’re using createReadStream to read the input PDF file, and then you’re using the axios HTTP client to post the data to PSPDFKit API. The response of the API is stored in the processed_documents folder.

Information

Make sure to replace pdf_live_<rest_of_your_api_key> with your API key.

Output

To run the code, use the command below:

node processor.js

On successful execution, it’ll create a file in the processed_documents folder called result.pdf.

How to OCR PDFs in Java

To OCR PDFs in Java, you’ll install the dependencies after setting up the files and folders.

Setting Up Files and Folders

For this tutorial, you’ll use IntelliJ IDEA as your primary code editor. Create a new project called ocr_pdf and use the settings shown in the following image.

New project in IntelliJ IDEA

Information

You can choose any location, but make sure to set Java as the language and Gradle as the build system.

Create a new directory in your project. Right-click your project’s name and select New > Directory. From there, choose the src/main/java option. Once done, create a class file called Processor.java inside the src/main/java folder, and in the same folder, create two folders called input_documents and processed_documents.

After that, paste your PDF file inside the input_documents folder.

Installing the Dependencies

Next, you’ll install two libraries:

  • OkHttp — This library is used to make API requests.

  • JSON —This library will be used for parsing the JSON payload.

Open the build.gradle file and add the following dependencies to your project:

dependencies {
    implementation 'com.squareup.okhttp3:okhttp:4.9.2'
    implementation 'org.json:json:20210307'
}

Writing the Code

Open the processor.java file and paste in the code below:

import java.io.File;

import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;

import org.json.JSONArray;

import org.json.JSONObject;
import okhttp3.MediaType;
import okhttp3.MultipartBody;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.RequestBody;
import okhttp3.Response;

public final class Processor {
    public static void main(final String[] args) throws IOException {
        final RequestBody body = new MultipartBody.Builder()
                .setType(MultipartBody.FORM)
                .addFormDataPart(
                        "scanned",
                        "document.pdf",
                        RequestBody.create(
                                new File("input_documents/document.pdf"),
                                MediaType.parse("application/pdf")
                        )
                )
                .addFormDataPart(
                        "instructions",
                        new JSONObject()
                                .put("parts", new JSONArray()
                                        .put(new JSONObject()
                                                .put("file", "scanned")
                                        )
                                )
                                .put("actions", new JSONArray()
                                        .put(new JSONObject()
                                                .put("type", "ocr")
                                                .put("language", "english")
                                        )
                                ).toString()
                )
                .build();

        final Request request = new Request.Builder()
                .url("https://api.pspdfkit.com/build")
                .method("POST", body)
                .addHeader("Authorization", "Bearer pdf_live_<rest_of_your_api_key>")
                .build();

        final OkHttpClient client = new OkHttpClient()
                .newBuilder()
                .build();

        final Response response = client.newCall(request).execute();

        if (response.isSuccessful()) {
            Files.copy(
                  response.body().byteStream(),
                  FileSystems.getDefault().getPath("procesed_documents/result.pdf"),
                  StandardCopyOption.REPLACE_EXISTING
            );
        } else {
            // Handle the error
            throw new IOException(response.body().string());
        }
    }
}

Code Explanation

You’re importing all the packages required to run the code. Next you create the Processor class. Then in the main function, you’re first creating the request body for the API call that contains all the instructions to OCR the PDF. After that, you’re calling the API to process the instructions.

Information

Make sure to replace pdf_live_<rest_of_your_api_key> with your API key.

Output

To execute the code, click the run button located to the right of your configuration.

On successful execution of code, you’ll see two new PDF files in the processed_documents folder.

Conclusion

In this post, you learned how to easily OCR PDF documents using the PSPDFKit OCR API. Whether you use Python, PHP, JavaScript, or Java, PSPDFKit provides a comprehensive API to integrate OCR into your document processing workflows. If you want to streamline document management and improve the searchability of your PDFs, try out the PSPDFKit API today!

FAQ

What is OCR and how does it work with PSPDFKit?

OCR (Optical Character Recognition) converts scanned documents or images into searchable and editable text. With PSPDFKit, you can automate this process using various programming languages like Python, PHP, JavaScript, and Java.

How do I get started with the PSPDFKit OCR API?

To get started, create a free account on the PSPDFKit website, obtain your API key, and follow the tutorials provided in Python, PHP, JavaScript, or Java to integrate OCR functionality into your project.

What are the benefits of using OCR on PDF files?

OCR enhances PDFs by making them searchable, editable, and accessible. This is particularly useful in industries like legal, healthcare, and finance, where document management and text retrieval are critical.

How many documents can I OCR with the free plan?

With the PSPDFKit free plan, you receive 100 free credits. Each OCR operation consumes credits, so the number of documents you can process will vary depending on the size and complexity of the PDFs.

Is it possible to combine OCR with other PSPDFKit API tools?

Yes, you can combine the OCR API with other PSPDFKit tools like merging, splitting, or redacting PDFs to create advanced workflows tailored to your specific document processing needs.

Author
Jonathan D. Rhyne Co-Founder and CEO

Jonathan joined PSPDFKit in 2014. As CEO, Jonathan defines the company’s vision and strategic goals, bolsters the team culture, and steers product direction. When he’s not working, he enjoys being a dad, photography, and soccer.

Related Products
PSPDFKit API

Product Page
Guides

Share Post
Free 60-Day Trial Try PSPDFKit in your app today.
Free Trial

Related Articles

Explore more
TUTORIALS  |  API • Python • How To

How to Convert DOCX to WebP Using Python

TUTORIALS  |  Python • API • OCR

Extract Text from PDF in Python: A Comprehensive Guide Using PyPDF and PSPDFKit API

TUTORIALS  |  API • Python • How To • Tesseract • OCR

Python OCR Tutorial: Using Tesseract OCR in Python