Create text highlight annotations from text extraction
Q: How to create text highlight annotations from text extraction?
A: Extracting text from a PDF file is a common task, but, as you might have noticed, it isn’t always as straightforward as it should be. For that reason PSPDFKit offers APIs to retrieve text from a document. On PSPDFKit for Web you can extract the text from a page using textLinesForPageIndex.
So the first step is to extract the text from a page ofthe PDF document
// Getting all text lines from page 0 const textLines = await instance.textLinesForPageIndex(0); textLines.forEach(textLine => console.log(textLine.contents));
Then we can retrieve the text lines bounding boxes PSPDFKit.TextLine#boundingBox:
const boundingBoxes = textLines.map(textLine => textLine.boundingBox)
This will return us a PSPDFKit.Geometry.Rect record for any textLine
in that page. In our case it returns a PSPDFKit.Immutable.List of two records because there are two lines of text on page 0
of the document.
The final step is to create an highlight annotation using those boundingBoxes like this:
instance.create( new PSPDFKit.Annotations.HighlightAnnotation({ pageIndex: 0, rects: boundingBoxes, boundingBox: PSPDFKit.Geometry.Rect.union(boundingBoxes) });