Indexed Full-Text Search (FTS)
PSPDFKit supports fast and efficient full-text search in PDF documents through PdfLibrary
. This document describes how to get started with PdfLibrary
.
Getting Started
Using PdfLibrary
is relatively straightforward. You begin by indexing documents:
1 2 3 4 5 6 7 | // Assume that you have two valid `PdfDocument`s. val doc1 : PdfDocument = ... val doc2 : PdfDocument = ... // The library will be saved in your application's files directory. val library = PdfLibrary.get(File(context.filesDir, "library.db").absolutePath) library.enqueueDocuments(listOf(doc1, doc2)) |
1 2 3 4 5 6 7 8 9 | // Assume that you have two valid `PdfDocument`s. PdfDocument doc1, doc2; // The library will be saved in your application's files directory. PdfLibrary library = PdfLibrary.get(new File(context.getFilesDir(), "library.db").getAbsolutePath()); List<PdfDocument> documentList = new ArrayList<>(); documentList.add(doc1); documentList.add(doc2); library.enqueueDocuments(documentList); |
PdfLibrary
allows you to query for the current indexing state.
You can decide to only query the library if all documents have been indexed by using isIndexing()
. You can also check the current status for individual documents by using getIndexStatusForUID()
.
The results are delivered to you with an onSearchCompleted
callback in QueryResultListener
. The results themselves are delivered as a Map
that maps the document’s UID String
to a set of page numbers containing the result.
If you wish to show preview snippets, you should enable the generateTextPreviews()
query option. Then the preview text snippets will be delivered to you in the onSearchPreviewsGenerated
method of QueryResultListener
as a Map
mapping the document’s UID String
to a set of QueryPreviewResult
objects.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | // Set up search result options. val options = QueryOptions.Builder() .generateTextPreviews(true) .previewRange(20, 120) .build() // Run the search. The search will run on a background thread and the callbacks will be called // from the background thread as well. library.search("looking for this text", options, object : QueryResultListener { override fun onSearchCompleted(p0: String, p1: Map<String, Set<Int>>) { // Results contain UID → set of pages mapping. } override fun onSearchPreviewsGenerated(p0: String, p1: Map<String, Set<QueryPreviewResult>>) { // Previews contain UID → set of `QueryPreviewResult` mappings. } }) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | // Set up search result options. final QueryOptions options = new QueryOptions.Builder() .generateTextPreviews(true) .previewRange(20, 120) .build(); // Run the search. The search will run on a background thread and the callbacks will be called // from the background thread as well. library.search("looking for this text", options, new QueryResultListener() { @Override public void onSearchCompleted(@NonNull String searchString, @NonNull Map<String, Set<Integer>> results) { // Results contain UID → set of pages mapping. } @Override public void onSearchPreviewsGenerated(@NonNull String searchString, @NonNull Map<String, Set<QueryPreviewResult>> previews) { // Previews contain UID → set of `QueryPreviewResult` mappings. } }); |
Advanced Matching Options
PdfLibrary
offers advanced matching options. You pass these options with a QueryOptions
object in the search()
method.
The following options are available:
Name | Type | Description |
---|---|---|
generateTextPreviews |
Boolean |
Retrieves preview snippets of text when searching and delivers them in the onSearchPreviewsGenerated() callback. |
maximumResultsTotal |
int |
The maximum amount of search results for the total of all documents. |
maximumResultsPerDocument |
int |
The maximum amount of search results per page. |
matchExactWords |
Boolean |
Only matches exact words. For example, “something” would not match “some.” |
matchExactPhrases |
Boolean |
Only matches exact phrases. For example, “this is a test” would not match “this is a quick test.” |
Advanced Configuration
You can configure PdfLibrary
to match your needs. The following properties on PdfLibrary
are available:
Property | Type | Default | Description |
---|---|---|---|
saveReversedPageText |
Boolean |
true |
Indicates if the reversed text of a PDF document should be saved. This increases the size of the cache by about two times, but it allows for ends-with searches. |