Indexed Full-Text Search (FTS)

PSPDFKit supports efficient and fast full-text search in PDF documents through PdfLibrary. This document describes how to get started with PdfLibrary.

Getting Started

Using PdfLibrary is pretty straightforward. You start by indexing documents:

Copy
1
2
3
4
5
6
7
// Assume that you have two PdfDocuments, doc1 and doc2.
val doc1 : PdfDocument = ...
val doc2 : PdfDocument = ...

// The library will be saved in your applications files directory
val library = PdfLibrary.get(File(context.filesDir, "library.db").absolutePath)
library.enqueueDocuments(listOf(doc1, doc2))
Copy
1
2
3
4
5
6
7
8
9
// Assume that you have two PdfDocuments, doc1 and doc2.
PdfDocument doc1, doc2;

// The library will be saved in your applications files directory
PdfLibrary library = PdfLibrary.get(new File(context.getFilesDir(), "library.db").getAbsolutePath());
List<PdfDocument> documentList = new ArrayList<>();
documentList.add(doc1);
documentList.add(doc2);
library.enqueueDocuments(documentList);

PdfLibrary allows you to query for current indexing state.

You can decide to only query the library if all documents have been indexed by using isIndexing(). You can also check the current status for individual documents by using getIndexStatusForUID().

The results are delivered to you with a onSearchCompleted callback in QueryResultListener. The results themselves are delivered as a Map that maps document UID String to set of page numbers containing the result.

If you wish to show preview snippets you should enable generateTextPreviews() query option and the preview text snippets will be delivered to you in onSearchPreviewsGenerated method of QueryResultListener as a Map mapping document UID String to a set of QueryPreviewResult objects.

Example:

Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Setup search result options
val options = QueryOptions.Builder()
    .generateTextPreviews(true)
    .previewRange(20, 120)
    .build()

// Run the search. The search will run in a background thread and the callbacks will be called
// from the background thread as well.
library.search("looking for this text", options, object : QueryResultListener {
    override fun onSearchCompleted(p0: String, p1: Map<String, Set<Int>>) {
        // Results contain UID -> set of pages mapping.
    }

    override fun onSearchPreviewsGenerated(p0: String, p1: Map<String, Set<QueryPreviewResult>>) {
    	// Previews contain UID -> set of QueryPreviewResult mappings.
    }
})
Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// Setup search result options
final QueryOptions options = new QueryOptions.Builder()
    .generateTextPreviews(true)
    .previewRange(20, 120)
    .build();

// Run the search. The search will run in a background thread and the callbacks will be called
// from the background thread as well.
library.search("looking for this text", options, new QueryResultListener() {
    @Override
    public void onSearchCompleted(@NonNull String searchString, @NonNull Map<String, Set<Integer>> results) {
        // Results contain UID -> set of pages mapping.
    }

    @Override
    public void onSearchPreviewsGenerated(@NonNull String searchString, @NonNull Map<String, Set<QueryPreviewResult>> previews) {
    	// Previews contain UID -> set of QueryPreviewResult mappings.
    }
});

Advanced Matching Options

PdfLibrary offers advanced matching options. You pass those options with a QueryOptions object in the search() method.

The following options are available:

Name Type Description
generateTextPreviews boolean Retrieves preview snippets of text when searching and delivers them in onSearchPreviewsGenerated() callback.
maximumResultsTotal int The maximum amount of search results for the total of all documents.
maximumResultsPerDocument int The maximum amount of search results per page.
matchExactWords boolean Only matches exact words. For example "something" would not match "some".
matchExactPhrases boolean Only matches exact phrases. For example, "this is a test" would not match "this is a quick test".

Advanced Configuration

You can configure PdfLibrary to match your needs. The following properties on PdfLibrary are available:

Property Type Default Description
saveReversedPageText boolean true Indicates if the reversed text of a PDF document should be saved. This increases the size of the cache by about 2x, but allows for ends-with searches.