Search Text in PDFs Using JavaScript

The PSPDFKit interface offers full-text search for your PDFs. It lists the number of results for a given term, lets you walk through the results, and highlights all occurrences of the search term in the document. You could say that it pretty much behaves like your browser’s search feature.

Launch Demo

Search field

But what if you wanted it to behave differently?

PSPDFKit lets you hook into search queries by listening to the search.termChange event.

It will get triggered on every text change of the search field:

let lastSearchTerm = "";

instance.addEventListener("search.termChange", async (event) => {
  // Opt out from the default implementation.
  event.preventDefault();

  const { term } = event;
  // Update the search term in the search box. Without this line,
  // the search box would stay empty.
  instance.setSearchState((state) => state.set("term", term));
  lastSearchTerm = term;

  // Perform a custom search for the term.
  const results = await customSearch(term, instance);

  // Our results could return in a different order than expected.
  // Let's make sure only results matching our current term are applied.
  if (term !== lastSearchTerm) {
    return;
  }

  // Finally, we apply the results. Note that you can also modify
  // the search state first and then pass the new state
  // to `instance.setSearchState`.
  const newState = instance.searchState.set("results", results);
  instance.setSearchState(newState);
});

Let’s say your search should only match whole words in your document. By default, the search lists all matching text fragments of a document, regardless of if they are whole words or not.

In our customSearch function, we use PSPDFKit’s instance.search under the hood, which is why we need to pass instance as an argument as well.

After the regular search, we can filter search results to only contain whole words:

async function customSearch(term, instance) {
  // We would get an error if we called `instance.search` with a term of
  // 2 characters or less.
  if (term.length <= 2) {
    return PSPDFKit.Immutable.List([]);
  }

  // Let's take the results from the default search as the foundation.
  const results = await instance.search(term);

  // We only want to find whole words that match the term we entered.
  const filteredResults = results.filter((result) => {
    const searchWord = new RegExp(`\\b${term}\\b`, "i");
    return searchWord.test(result.previewText);
  });

  return filteredResults;
}

Highlighting Custom Search Results

Highlight custom search results in one of the following ways:

Highlighting Custom Search Results with Highlight Annotations

To highlight custom search results with highlight annotations, follow these steps:

  1. Search an entire PDF or a range of pages using the search API.

  2. Create highlight annotations around the search results using the create API.

The code below highlights the word hello on all the pages of a PDF document:

const results = await instance.search("hello");

const annotations = results.map((result) => {
  return new PSPDFKit.Annotations.HighlightAnnotation({
    pageIndex: result.pageIndex,
    rects: result.rectsOnPage,
    boundingBox: PSPDFKit.Geometry.Rect.union(result.rectsOnPage)
  });
});
instance.create(annotations);

Highlighting Custom Search Results with Custom Overlay Items

To highlight custom search results with custom overlay items, follow these steps:

  1. Search an entire PDF or a range of pages using the search API.

  2. Create custom overlay items around the search results.

The code below highlights the word hello on all the pages of a PDF document:

instance
  .search("hello")
  .then((results) => {
    results.toJS().forEach((result, i) => {
      const div = document.createElement("div");
      div.style.backgroundColor = "#808000";
      div.style.mixBlendMode = "multiply";
      div.style.opacity = 0.5;
      div.style.width = result.rectsOnPage[0].width + "px";
      div.style.height = result.rectsOnPage[0].height + "px";
      const item = new PSPDFKit.CustomOverlayItem({
        id: "overlay" + i,
        node: div,
        pageIndex: result.pageIndex,
        position: new PSPDFKit.Geometry.Point({
          x: result.rectsOnPage[0].left,
          y: result.rectsOnPage[0].top
        })
      });
      instance.setCustomOverlayItem(item);
    });
  })
  .catch(console.log);

Taking this approach even further, you could provide your own search implementation.

Your customSearch function could be a little local implementation like the one above, but it could also be a request to a huge search data center. The only thing that matters is that you provide the result in the correct format to the searchState:

Your results should be a PSPDFKit.Immutable.List of PSPDFKit.SearchResults.

Additional Information

To learn more about this topic, check out these API documentation pages: