Save PDFs to a Custom Data Provider on Android

PSPDFKit supports loading data from many different sources. In fact, this can be done from any object that conforms to the DataProviding protocol, which is known as a data provider.

A data provider defines a common interface for PSPDFKit to load PDF documents from arbitrary sources like cloud hosts, device RAM, content providers, and others.

Existing Data Provider Classes

PSPDFKit comes with a range of predefined data providers that all implement DataProvider:

  • AssetDataProvider allows loading of documents directly from the app’s assets/ directory. This is useful if you ship PDF documents as part of your APK file. Note that copying assets to the internal device storage may perform better than reading them directly from the assets using this provider.

  • ContentResolverDataProvider uses Android’s content resolver framework for reading documents directly from a ContentProvider specified by a URI using the content:// scheme.

  • InputStreamDataProvider is an abstract base class that simplifies reading documents from an InputStream. Subclasses have to override the openInputStream() method to provide the ready-to-read stream. Be aware that while it’s convenient to use an InputStream, it can pose performance issues. This is caused by the fact that PDF documents are read using random access, whereas InputStream only offers stream access. Therefore, InputStreamDataProvider will reopen the underlying input stream every time it needs to “seek backward.”

  • AesDataProvider is shipped with the Catalog app and allows you to open AES256-CTR-encrypted files without storing the decrypted blocks anywhere. It supports random seeking and can handle large PDF files without causing OutOfMemoryExceptions.

Custom Data Provider

To create a custom data provider for your application, you’ll have to create a class that implements the DataProvider interface and all of its methods. If you’d like to use your data provider with PdfActivity, your class also needs to implement Android’s Parcelable interface. If you plan to use the data provider directly with the PdfFragment, you don’t need to make it into a Parcelable.

Take a look at CustomDataProviderExample inside the Catalog app. This shows how to create a data provider that can read a PDF document from the app’s res/raw/ directory using an InputStream.

Writeable Data Providers

If you want your custom DataProvider to also be writeable, you need to implement the WritableDataProvider interface. This tells the framework that your DataProvider also supports writing changes made to the data. Here’s the outline of how this would appear:

class ExampleDataProvider : InputStreamDataProvider(), WritableDataProvider {

    ...

    // Tells the system we can write to this data provider.
    override fun canWrite(): Boolean = true

    override fun startWrite(writeMode: WritableDataProvider.WriteMode): Boolean {
        when (writeMode) {
            WritableDataProvider.WriteMode.REWRITE_FILE -> {
                // Prepare for writing, e.g. creating a new
                // temporary file to write to.

                ...

                // Return `true` to indicate we can proceed with writing.
                return true
            }
            WritableDataProvider.WriteMode.APPEND_TO_FILE -> {
                // This won't occur when returning `false` in
                // `supportsAppending`.
                return false
            }
        }
    }

    // This gets called repeatedly with the data we need to write.
    // Depending on the current write mode, either append
    // it to the existing data or write to a new file.
    override fun write(data: ByteArray): Boolean {

        ...

        // Return `true` to indicate we can proceed with writing.
        return true
    }

    // This is called once all data is written to give you an
    // opportunity to finish your writing process.
    override fun finishWrite(): Boolean {

        ...

        // Return `true` to indicate writing was successful.
        return true
    }

    // If you support appending data, you can return `true`. For this simple
    // example, we just return `false`.
    // Returning `true` doesn't mean it will always append;
    // you still need to support both write modes.
    override fun supportsAppending(): Boolean = false
}
class ExampleDataProvider extends InputStreamDataProvider implements WritableDataProvider {

    ...

    @Override
    public boolean canWrite() {
        // Tells the system we can write to this data provider.
        return true;
    }

    @Override
    public boolean startWrite(WriteMode writeMode) {
        switch (writeMode) {
            case REWRITE_FILE:
                // Prepare for writing, e.g. creating a new
                // temporary file to write to.

                ...

                // Return `true` to indicate we can proceed with writing.
                return true;
            case APPEND_TO_FILE:
                // This won't occur when returning `false` in
                // `supportsAppending`.
                return false;
        }

        return false;
    }

    // This gets called repeatedly with the data we need to write.
    // Depending on the current write mode, either append
    // it to the existing data or write to a new file.
    @Override
    public boolean write(byte[] data) {

        ...

        // Return `true` to indicate we can proceed with writing.
        return true;
    }

    // This is called once all data is written to give you an
    // opportunity to finish your writing process.
    @Override
    public boolean finishWrite() {

        ...

        // Return `true` to indicate writing was successful.
        return true;
    }

    // If you support appending data you can return `true`. For this
    // simple example, we just return `false`.
    // Returning `true` doesn't mean it will always append;
    // you still need to support both write modes.
    @Override
    public boolean supportsAppending() {
        return false;
    }
}

For a complete example, check out AesDataProvider, which is part of our Catalog app.

More Uses for Data Providers

In certain cases, it can be beneficial to use a DataProvider not just for displaying documents, but also to write data. Certain APIs, such as XfdfFormatter and DocumentJsonFormatter, already allow you to use a DataProvider for the input. When using the OutputStreamAdapter, you can also use the same DataProvider for the output, so long as it implements WritableDataProvider. Let’s look at an example of how to store your XFDF data encrypted:

val file = ...
val annotations = ...
val formFields = ...
// `AesDataProvider` is a sample data provider found in our Catalog app.
val dataProvider = AesDataProvider(file.canonicalPath, BASE64_ENCRYPTION_KEY)

// Export all annotations in the document to our data provider.
XfdfFormatter.writeXfdf(document,
    annotations,
    formFields,
    OutputStreamAdapter.Builder.fromDataProvider(dataProvider).build())

// You can use the same data provider for reimporting.
val parsedAnnotations = XfdfFormatter.parseXfdf(document, dataProvider)
File file = ...
List<Annotation> annotations = ...
List<FormField> formFields = ...
// `AesDataProvider` is a sample data provider found in our Catalog app.
AesDataProvider dataProvider = new AesDataProvider(file.canonicalPath,
    BASE64_ENCRYPTION_KEY);

// Export all annotations in the document to our data provider.
XfdfFormatter.writeXfdf(getDocument(),
    annotations,
    formFields,
    OutputStreamAdapter.Builder.fromDataProvider(dataProvider).build());

// You can use the same data provider for reimporting.
List<Annotation> parsedAnnotations = XfdfFormatter.parseXfdf(getDocument(),
    dataProvider);

Writing Strategies

The OutputStreamAdapter can use a different WritingStrategy depending on the requirements of the given DataProvider. By default, we provide two WritingStrategy implementations:

  1. DirectWritingStrategy — This writes immediately to the DataProvider and is used by default.

  2. TempFileWritingStrategy — This writes to a temporary file and only commits it to the DataProvider once all data is ready. This is useful if the operation writing to your DataProvider is simultaneously reading from it.