public final class

PdfLibrary

extends Object
java.lang.Object
   ↳ com.pspdfkit.document.library.PdfLibrary

Class Overview

PdfLibrary implements a SQLite-based full-text-search engine. You can register documents to be indexed in the background and then search for keywords within that collection. There can be multiple libraries, although usually one is enough for the common use case.

Summary

Nested Classes
@interface PdfLibrary.Tokenizer  
Constants
String PORTER_TOKENIZER The name of PSPDFKit's custom porter tokenizer that allows better CJK indexing.
String UNICODE_TOKENIZER The name of PSPDFKit's custom Unicode tokenizer.
Public Methods
void addLibraryIndexingListener(LibraryIndexingListener listener)
Adds a LibraryIndexingListener to monitor document indexing status.
void clearIndex()
Completely clears the index for this library.
void enqueueDocumentSources(List<DocumentSource> documentSources)
Queues an array of documents for indexing.
void enqueueDocumentSourcesWithMetadata(List<Pair<DocumentSource, byte[]>> documentSources)
Queues an array of documents for indexing together with passed free-form metadata.
void enqueueDocuments(List<PdfDocument> documents, IndexingOptions indexingOptions)
Queues an array of documents for indexing.
void enqueueDocuments(List<PdfDocument> documents)
Queues an array of documents for indexing.
void enqueueDocumentsWithMetadata(List<Pair<PdfDocument, byte[]>> documents)

Queues an array of documents for indexing together with passed free-form metadata.

void enqueueDocumentsWithMetadata(List<Pair<PdfDocument, byte[]>> documents, IndexingOptions indexingOptions)

Queues an array of documents for indexing together with passed free-form metadata.

static PdfLibrary get(String path, String tokenizer)
Returns a library for a given path.
static PdfLibrary get(String path)
Returns a library for a given path.
LibraryIndexStatus getIndexStatusForUID(String uid)
Returns indexing status for a document with passed UID.
List<String> getIndexedUIDs()
Returns list of UIDs of documents currently indexed.*
byte[] getMetadataForUID(String uid)
Returns metadata appended to document with enqueueDocumentsWithMetadata(List) call.
List<String> getQueuedUIDs()
Returns list of UIDs of documents queued for indexing.
boolean getSaveReverseText()
Indicates whether saving the reverse text is enabled.
boolean isIndexing()
Indicates whether the indexing is in progress or not.
void removeDocuments(List<String> documentUIDs)
Invalidates index for documents.
void removeLibraryIndexingListener(LibraryIndexingListener listener)
void search(String searchString, QueryOptions options, QueryResultListener resultListener)
Query the database for a match of searchString.
void setSaveReverseText(boolean saveReverseText)
Will save a reversed copy of the original page text.
int size()
Returns number of indexed documents in this library.
void stopSearch()
Stops search and all in-progress preview text generator tasks.
[Expand]
Inherited Methods
From class java.lang.Object

Constants

public static final String PORTER_TOKENIZER

The name of PSPDFKit's custom porter tokenizer that allows better CJK indexing. This tokenizer also comes with a few drawbacks, like much more lax matching of words (Searching for "Dependency" will also return "Dependencies"), if this is a problem use the UNICODE_TOKENIZER instead.

This is the default tokenizer used when no other one is specified.

Constant Value: "PorterTokenizer"

public static final String UNICODE_TOKENIZER

The name of PSPDFKit's custom Unicode tokenizer. This tokenizer wraps around SQLite's unicode61 tokenizer to add full case folding to the indexed text.

Warning: This tokenizer is only available when the library supports FTS5, otherwise specifying this as the value for the tokenizer parameter will result in an error when trying to create the library.

Constant Value: "UnicodeTokenizer"

Public Methods

public void addLibraryIndexingListener (LibraryIndexingListener listener)

Adds a LibraryIndexingListener to monitor document indexing status. If the listener has already been added previously, this method will be a no-op. Adding null is not allowed, and will result in an exception.

Parameters
listener LibraryIndexingListener that should be notified. Must be non-null.

public void clearIndex ()

Completely clears the index for this library.

public void enqueueDocumentSources (List<DocumentSource> documentSources)

Queues an array of documents for indexing. Any documents already queued or fully indexed will be ignored. This call will avoid opening documents until they're indexed and it's thus significantly more memory friendly than enqueueDocuments(List).

Parameters
documentSources List of document sources to index.

public void enqueueDocumentSourcesWithMetadata (List<Pair<DocumentSource, byte[]>> documentSources)

Queues an array of documents for indexing together with passed free-form metadata. This call will avoid opening documents until they're indexed and it's thus significantly more memory friendly than enqueueDocumentsWithMetadata(List).

Metadata can be retrieved after indexing with getMetadataForUID(String) method call.

Any documents already queued or fully indexed will be ignored.

Parameters
documentSources List of document sources to index.

public void enqueueDocuments (List<PdfDocument> documents, IndexingOptions indexingOptions)

Queues an array of documents for indexing. Any documents already queued or fully indexed will be ignored.

Parameters
documents List of documents to index.
indexingOptions Options for indexing the given documents.

public void enqueueDocuments (List<PdfDocument> documents)

Queues an array of documents for indexing. Any documents already queued or fully indexed will be ignored.

NOTE: This call requires all documents to be opened when indexing and will most likely lead to out of memory conditions if a lot of documents are passed. Prefer to use enqueueDocumentSources(List) if possible!

Parameters
documents List of documents to index.

public void enqueueDocumentsWithMetadata (List<Pair<PdfDocument, byte[]>> documents)

Queues an array of documents for indexing together with passed free-form metadata. Metadata can be retrieved after indexing with getMetadataForUID(String) method call.


NOTE: This call requires all documents to be opened when indexing and will most likely lead to out of memory conditions if a lot of documents are passed. Prefer to use enqueueDocumentSources(List) if possible!


Any documents already queued or fully indexed will be ignored.

Parameters
documents List of documents to index with metadata to be stored.

public void enqueueDocumentsWithMetadata (List<Pair<PdfDocument, byte[]>> documents, IndexingOptions indexingOptions)

Queues an array of documents for indexing together with passed free-form metadata. Metadata can be retrieved after indexing with getMetadataForUID(String) method call.


Any documents already queued or fully indexed will be ignored.

Parameters
documents List of documents to index with metadata to be stored.
indexingOptions Options for indexing the given documents.

public static PdfLibrary get (String path, String tokenizer)

Returns a library for a given path. If no library exists for this path yet, this method will create and return one.

Parameters
path Writable path to library database file.
tokenizer The tokenizer to use, one of PORTER_TOKENIZER or UNICODE_TOKENIZER. This controls how the PdfLibrary matches queries to the content in the index.
Throws
IOException if file could not be written.

public static PdfLibrary get (String path)

Returns a library for a given path. If no library exists for this path yet, this method will create and return one.

Parameters
path Writable path to library database file.
Throws
IOException if file could not be written.

public LibraryIndexStatus getIndexStatusForUID (String uid)

Returns indexing status for a document with passed UID.

Parameters
uid UID of the document
Returns
  • Indexing status of a document with provided UID.

public List<String> getIndexedUIDs ()

Returns list of UIDs of documents currently indexed.*

Returns
  • List of indexed UIDs.

public byte[] getMetadataForUID (String uid)

Returns metadata appended to document with enqueueDocumentsWithMetadata(List) call.

Parameters
uid UID of the document.
Returns
  • Metadata for the passed document or null if no metadata was found.

public List<String> getQueuedUIDs ()

Returns list of UIDs of documents queued for indexing.

Returns
  • List of queued UIDs.

public boolean getSaveReverseText ()

Indicates whether saving the reverse text is enabled.

Returns
  • true if saving reverse text is enabled, false otherwise.

public boolean isIndexing ()

Indicates whether the indexing is in progress or not.

Returns
  • true if indexing is in progress, false otherwise.

public void removeDocuments (List<String> documentUIDs)

Invalidates index for documents.

Parameters
documentUIDs List of document UIDs to be invalidated.

public void removeLibraryIndexingListener (LibraryIndexingListener listener)

Removes a registered LibraryIndexingListener added with addLibraryIndexingListener(LibraryIndexingListener). Upon calling this method the listener will no longer be notified of any changes. If the listener has not been added, this method will be a no-op. Adding null is not allowed,and will result in an exception.

Parameters
listener LibraryIndexingListener that should be removed. Must be non-null.

public void search (String searchString, QueryOptions options, QueryResultListener resultListener)

Query the database for a match of searchString. Only direct matches, begins-with and ends-with matches are supported. Returns a map of document UIDs to set of pages matching inside that document.

Parameters
searchString String to search for.
options Options object determining search behaviour. May be null for default behaviour.
resultListener Callback listener which will be called with search results. Note that the methods in the listener will be called on the background thread.

public void setSaveReverseText (boolean saveReverseText)

Will save a reversed copy of the original page text. If enabled the index database will be about 2x bigger, but ends-with matches will be enabled.

Parameters
saveReverseText true to save reversed text to index, true by default.

public int size ()

Returns number of indexed documents in this library.

Returns
  • number of documents that have finished indexing.

public void stopSearch ()

Stops search and all in-progress preview text generator tasks.