Indexed Full-Text Search (FTS)

PSPDFKit supports fast and efficient full-text search in PDF documents through PSPDFKit.Search.Library. This document describes how to get started.

Getting Started

To start indexing, create a Library and give it a name. You can then add folders that contain PDF files to this named library. The Library will index all the PDFs in those folders.

Here is a simple example of how to create or open a library and start indexing PDFs in a directory:

Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Opening a library creates one if it doesn't already exist.
var library = await Library.OpenLibraryAsync("MyLibrary");

// Find a folder containing PDFs.
var folderPicker = new Windows.Storage.Pickers.FolderPicker();
folderPicker.SuggestedStartLocation = Windows.Storage.Pickers.PickerLocationId.Desktop;
folderPicker.FileTypeFilter.Add("*");

Windows.Storage.StorageFolder folder = await folderPicker.PickSingleFolderAsync();
if (folder != null)
{
  // Queue up the PDFs in the folder for indexing.
  library.EnqueueDocumentsInFolderAsync(folder);
}

The documents will now be indexed in the background.

You can choose to start querying documents right away or wait until all documents added to the indexer queue have been completed.

Here is an example of how to wait and then get the list of indexed documents:

Copy
1
2
3
4
5
// Wait for indexing to finish.
await library.WaitForAllIndexingTasksToFinishAsync();

// Get the list of indexed documents.
var documentUIDs = await library.GetIndexedUidsAsync();

Identifying Documents

The documents in the list returned by GetIndexedUidsAsync are represented by a unique ID (UID), which is a string comprised of a Future Access Token identifying the folder containing the PDF and the file name of the PDF within that folder. Due to the unique restrictions of UWP, it is essential that you don’t clear the application’s Future Access List if you wish to retain your libraries, as this is the only place for the Future Access Token to be recorded.

You can create a PSPDFKit.Document.DocumentSource object for a given document UID using the static method DocumentSource.CreateFromUidAsync. A StorageFile object for the file can be accessed by calling GetFile on the created DocumentSource object. Note that the method will throw an exception if the document referred to can no longer be located.

Here is an example:

Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Get the list of indexed documents.
var documentUIDs = await library.GetIndexedUidsAsync();

foreach (var uid in documentUIDs)
{
  try
  {
    var documentSource = await DocumentSource.CreateFromUidAsync(uid);
    StorageFile file = documentSource.GetFile();
  }
  catch (Exception e)
  {
    // Examine the exception.
  }
}

Index and Document Status

Library allows you to query for the current indexing state.

You can decide to only query the library if all queued documents have been indexed by using IsIndexingAsync(). You may also check the current status of individual documents by using GetIndexDocumentStatusAsync().

Querying the Library

To query the library, use the method SearchAsync, supplying it with a LibraryQuery object.

Here is an example:

1
2
// Search all documents in the library for the text "Acme."
var succeeded = await library.SearchAsync(new LibraryQuery("Acme"));

The results of the query are sent to a query result handler, which you must provide to the library.

Here is an example:

1
library.OnSearchComplete += MyOnSearchCompleteMethod;

The OnSearchComplete event handler receives a reference to the originating library, along with a dictionary mapping a document UID to a LibraryQueryResult object. Each result object also contains the UID as a property and a list of the page indexes where matching results were found.

If you wish to show preview snippets, you should set the GenerateTextPreviews property in the query object to true. Then, preview text snippets will be delivered to you via the OnSearchPreviewComplete event handler.

Here is an example:

Copy
1
2
3
4
5
6
7
library.OnSearchPreviewComplete += MyOnSearchPreviewCompleteMethod;

var query = new LibraryQuery("Acme")
{
  GenerateTextPreviews = true
}
var succeeded = await library.SearchAsync(query);

The OnSearchPreviewComplete event handler receives a reference to the originating library, along with a list of LibraryPreviewResult objects — one for each match. Each of these objects contains a UID identifying the document, a page index where the matching text is located, a snippet of text surrounding the match, the range of the matched text within the preview snippet, and the page text. It also has an annotation ID indicating whether or not the match was found in an annotation.

Advanced Matching Options

Library offers advanced matching options. You can set these options in a LibraryQuery object.

Example Code

You can find a complete working code example in the Catalog app provided with the SDK.