Using Custom Tokenizers
PSPDFKit uses SQLite to build the full-text index used in
PSPDFDocumentPickerController, and also for various other data-saving operations (like the image cache metadata). PSPDFKit doesn’t ship with its own SQLite version, and instead uses the one that is already in iOS. PSPDFKit also supports custom SQLite builds.
PSPDFLibrary uses its own tokenizer, which works well for many languages, including Chinese, Japanese, and Korean (CJK). It also enables searching for related words, like finding “dependencies” when searching for “depending.” This is implemented by the
When should you ship your own build of SQLite?
- When you want better indexing performance.
- When you need features only available in a newer version of SQLite.
- When you need better performance for exact word or phrase matches.
If you rely a lot on exact word or phrase matches, the default tokenizer set by
PSPDFLibrary might not be optimal and you should consider switching to a custom one.
By default, PSPDFKit uses a custom tokenizer for building the full-text search (FTS) index that can deal with CJK characters as well. Alternatively, we ship another custom tokenizer, referenced by the
PSPDFLibraryUnicodeTokenizerName identifier. This tokenizer is a wrapper around SQLite’s
unicode61 tokenizer, but it performs full case folding. This is useful in cases where the document being indexed has text like
Straße, and you’d like it to match when searching for
You can also use the custom tokenizers shipped with SQLite itself, like the
|Tokenizer||Minimum FTS Version||Minimum SQLite Version|
Note that simply linking the correct SQLite version with your application is not enough: You must ensure that the linked SQLite is built with the correct flags to enable FTS4 or FTS5. Trying to enable a tokenizer on an unsupported FTS version will result in the initialization of
let library = try! PSPDFLibrary(path: PSPDFLibrary.defaultLibraryPath(), tokenizer: "unicode61") let documentPicker = PSPDFDocumentPickerController(directory: "/path/to/files", includeSubdirectories: true, library: library)
PSPDFLibrary *library = [PSPDFLibrary libraryWithPath:PSPDFLibrary.defaultLibraryPath tokenizer:@"unicode61" error:NULL]; PSPDFDocumentPickerController *documentPicker = [[PSPDFDocumentPickerController alloc] initWithDirectory:@"/path/to/files" includeSubdirectories:YES library:library];
Optionally, you can also ship your own version of SQLite. To do so, please do the following. In the PSPDFKit.dmg you downloaded, you will find a current version of SQLite in the Extras folder already prepared to be linked. Add the
SQLite.xcodeproj to your Xcode project, and then add
libSQLite.a as a Target Dependency and under Link Binary with Libraries. Make sure that you don’t link the
You will have to delete your app or at least the library file so that the index is fully rebuilt after a different tokenizer has been set.