Create Text Highlight Annotations from Text Extraction

Extracting text from a PDF file is a common task, but it isn’t always as straightforward as it should be. For that reason, PSPDFKit offers APIs to retrieve text from a document. On PSPDFKit for Web, you can extract text from a page using textLinesForPageIndex and instance.create.

The first step is to extract the text from a page of the PDF document:

// Getting all text lines from page `0`.
const textLines = await instance.textLinesForPageIndex(0);
textLines.forEach((textLine) => console.log(textLine.contents));

Then, retrieve the text lines bounding boxes using PSPDFKit.TextLine#boundingBox:

const boundingBoxes = => textLine.boundingBox);

This will return a PSPDFKit.Geometry.Rect record for any textLine on that page. In this case, it returns a PSPDFKit.Immutable.List of two records because there are two lines of text on page 0 of the document.

The final step is to create an highlight annotation using the boundingBoxes, like this:

  new PSPDFKit.Annotations.HighlightAnnotation({
    pageIndex: 0,
    rects: boundingBoxes,
    boundingBox: PSPDFKit.Geometry.Rect.union(boundingBoxes)