Blog Post

PSPDFKit API OCR and Office Conversion Improvements

Tomas Surin
Kelly Benitez
Illustration: PSPDFKit API OCR and Office Conversion Improvements

PSPDFKit API is now shipping with brand-new OCR and Office conversion engines. Earlier this year, PSPDFKit merged with ORPALIS, and in the last few weeks, we’ve been diligently working on leveraging GdPicture.NET technology to deliver significant improvements in performance and accuracy to PSPDFKit API.

What Is GdPicture.NET?

GdPicture.NET is a comprehensive all-in-one toolkit providing complete PDF support, along with support for a number of file formats — including Office, CAD, and images. It also ships with rich image processing and industry-leading OCR and document-understanding capabilities that are using state-of-the-art artificial intelligence and machine learning algorithms. Over the coming months, we’ll be incorporating much of this cutting-edge technology into PSPDFKit API.

Why We Replaced the Previous Engines

Our previous OCR engine was based on the Tesseract open source project, and we used LibreOffice as the core of our Office conversion tools. Our technology produced good-quality results, but we found it lacking in certain aspects due to these two fundamental parts that were powering it.

The main issue with our OCR engine was the performance, which was only acceptable at best. In the case of Office conversion, our main pain point was that we were unable to effectively improve the conversion quality itself.

Performance and Accuracy

Both the OCR and Office conversion engines bring improved performance and accuracy, with documents being processed more quickly and accurately. The OCR performance gain is especially considerable: We measured improved performance of up to 7× when compared to the previous engine — all while delivering the same or sometimes even better accuracy.

With Office conversion, we achieved better conversion results for many documents from our test set. We didn’t find any regressions in quality on the same set of documents.

Conclusion

We’re excited to bring you these huge improvements to our API tools. You can already try the tools for free:

We also invite you to read our blog posts with detailed explanations of how to use our Office conversion tools:

Note that this is only a small glimpse into what’s possible with the combined powers of PSPDFKit API and GdPicture.NET. Stay tuned for the new capabilities and improvements that we’re planning to introduce soon.

Related Products
PSPDFKit API

Product Page
Guides

Share Post
Free 60-Day Trial Try PSPDFKit in your app today.
Free Trial

Related Articles

Explore more
DEVELOPMENT  |  API • Elixir • Insights

Rate Limiting Server Requests in Elixir

DESIGN  |  Web • Processor • API • PDF Generation • Tips

Adding Custom Fonts to HTML Documents

DESIGN  |  Web • Processor • API • PDF Generation • Tips

HTML-to-PDF Invoice Generation with Headers and Footers