Extract Text from PDFs Using JavaScript

Extracting text from a PDF can be a complex task, so we offer several abstractions to make this simpler. In a PDF, text usually consists of glyphs that are absolutely positioned. PSPDFKit heuristically splits these glyphs up into words and blocks of text. Our user interface leverages this information to allow users to select and annotate text. You can read more about this in our text selection guide.

Use textLinesForPageIndex to extract the text from a given PDF page index:

const lines = await instance.textLinesForPageIndex(0);

For Server-based deployment, use the /pages/:page_index/text endpoint to fetch all text contained in a page.