Create text highlight annotations from text extraction
Q: How to create text highlight annotations from text extraction?
A: Extracting text from a PDF file is a common task, but, as you might have noticed, it isn’t always as straightforward as it should be. For that reason PSPDFKit offers APIs to retrieve text from a document. On PSPDFKit for Web you can extract the text from a page using textLinesForPageIndex.
So the first step is to extract the text from a page ofthe PDF document
1 2 3 | // Getting all text lines from page 0 const textLines = await instance.textLinesForPageIndex(0); textLines.forEach(textLine => console.log(textLine.contents)); |
Then we can retrieve the text lines bounding boxes PSPDFKit.TextLine#boundingBox:
1 | const boundingBoxes = textLines.map(textLine => textLine.boundingBox) |
This will return us a PSPDFKit.Geometry.Rect record for any textLine
in that page. In our case it returns a PSPDFKit.Immutable.List of two records because there are two lines of text on page 0
of the document.
The final step is to create an highlight annotation using those boundingBoxes like this:
1 2 3 4 5 6 | instance.create( new PSPDFKit.Annotations.HighlightAnnotation({ pageIndex: 0, rects: boundingBoxes, boundingBox: PSPDFKit.Geometry.Rect.union(boundingBoxes) }); |