Data Providers

In addition to loading documents using a URI, PSPDFKit can load documents using a DataProvider. A data provider defines a common interface for PSPDFKit to load PDF documents from arbitrary sources like cloud hosts, device RAM, content providers, and others.

Launching a PdfActivity from a data provider works just like starting it using a Uri. Here is an example using the AssetDataProvider for loading a PDF document from the assets/ directory of your app:

Copy
1
2
3
4
5
6
7
8
// Create a data provider that reads directly from the app's assets.
val dataProvider = AssetDataProvider("document.pdf")

// Launch the activity using the data provider.
val intent = PdfActivityIntentBuilder.fromDataProvider(context, dataProvider)
        .configuration(configuration.build())
        .build()
context.startActivity(intent)
Copy
1
2
3
4
5
6
7
8
// Create a data provider that reads directly from the app's assets.
DataProvider dataProvider = new AssetDataProvider("document.pdf");

// Launch the activity using the data provider.
final Intent intent = PdfActivityIntentBuilder.fromDataProvider(context, dataProvider)
        .configuration(configuration.build())
        .build();
context.startActivity(intent);

Existing Data Provider Classes

PSPDFKit comes with a range of predefined data providers that all implement DataProvider:

  • AssetDataProvider allows loading of documents directly from the app’s assets/ directory. This is useful if you ship PDF documents as part of your APK file. Note that copying assets to the internal device storage may perform better than reading them directly from the assets using this provider.

  • ContentResolverDataProvider uses Android’s content resolver framework for reading documents directly from a ContentProvider specified by a URI using the content:// scheme.

  • InputStreamDataProvider is an abstract base class that simplifies reading documents from an InputStream. Subclasses have to override the openInputStream() method to provide the ready-to-read stream. Be aware that while it is convenient to use an InputStream, it can pose performance issues. This is caused by the fact that PDF documents are read using random access, whereas InputStream only offers stream access. Therefore, InputStreamDataProvider will reopen the underlying input stream every time it needs to “seek backward.”

  • AesDataProvider is shipped with the Catalog app and allows you to open AES256-CTR-encrypted files without storing the decrypted blocks anywhere. It supports random seeking and can handle large PDF files without causing OutOfMemoryExceptions.

Custom Data Provider

To create a custom data provider for your application, you will have to create a class that implements the DataProvider interface and all of its methods. If you would like to use your data provider with PdfActivity, your class also needs to implement Android’s Parcelable interface. If you plan to use the data provider directly with the PdfFragment, you don’t need to make it into a Parcelable.

Take a look at CustomDataProviderExample inside the Catalog app. This shows how to create a data provider that can read a PDF document from the app’s res/raw/ directory using an InputStream.

Data Provider Progress

If your custom DataProvider needs to prepare a document before it can be shown using PSPDFKit — for example, by downloading the PDF file from the web or by decrypting it up front — you can implement the ProgressDataProvider interface and PSPDFKit will take care of showing a progress UI to your users. The interface defines a single method, #observeProgress, which has to return an RxJava Flowable, which emits the current progress as a Double in the range of [0, 1].

💡 Tip: In case your ProgressDataProvider has no progress to report (e.g. if a file is already available), it can simply return ProgressDataProvider.COMPLETE inside #observeProgress.

Here’s a small example of a data provider that reports the progress of a file download. The full example can be found in our Catalog app as ProgressProviderExample:

Copy
RemoteDataProvider.kt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class RemoteDataProvider : InputStreamDataProvider, ProgressDataProvider {
    /** Internally, the data provider uses a subject to publish any download progress. */
    private val progressSubject = PublishSubject.create<Double>()
    /** Responsible for downloading our PDF.  */
    private var downloadJob: DownloadJob? = null
    /** Used to block document opening until the download is done. */
    private val downloadLatch = CountDownLatch(1)

    override fun observeProgress(): Flowable<Double> {
        // We can just return our `PublishSubject`.
        return progressSubject.toFlowable(BackpressureStrategy.LATEST)
    }

    override fun openInputStream(): InputStream {
        val job = startDownloadIfNotRunning()
        // Block document opening until the download is done.
        downloadLatch.await()
        // Once the download is complete, continue with the document opening.
        return FileInputStream(job.outputFile)
    }

    private fun startDownloadIfNotRunning(): DownloadJob {
        var job = downloadJob
        if (job != null) return job

        job = DownloadJob.startDownload(...)
        job.setProgressListener(object : DownloadJob.ProgressListener {
            override fun onProgress(progress: Progress) {
                // Notify our listeners about the download progress.
                progressSubject.onNext(progress.bytesReceived.toDouble() / progress.totalBytes.toDouble())
            }

            override fun onComplete(output: File) {
                progressSubject.onComplete()
                // Unblock the actual document loading.
                downloadLatch.countDown()
            }

            ...
        })

        downloadJob = job
        return job
    }
}
Copy
RemoteDataProvider.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
public static class RemoteDataProvider
    extends InputStreamDataProvider implements ProgressDataProvider {

    /** Internally, the data provider uses a subject to publish any download progress. */
    private PublishSubject<Double> progressSubject = PublishSubject.create();
    /** Responsible for downloading our PDF. */
    private DownloadJob downloadJob;
    /** Used to block document opening until the download is done. */
    private CountDownLatch downloadLatch = new CountDownLatch(1);

    @NonNull @Override protected InputStream openInputStream() throws Exception {
        startDownloadIfNotRunning();
        // Block document opening until the download is done.
        downloadLatch.await();
        // Once the download is complete, continue with the document opening.
        return new FileInputStream(downloadJob.getOutputFile());
    }

    @NonNull @Override public Flowable<Double> observeProgress() {
        // We can just return our `PublishSubject`.
        return progressSubject.toFlowable(BackpressureStrategy.LATEST);
    }

    private void startDownloadIfNotRunning() {
        if (downloadJob != null) return;

        // In this short example, we use the PSPDFKit `DownloadJob` to download a PDF from the web.
        downloadJob = DownloadJob.startDownload(...)
        downloadJob.setProgressListener(new DownloadJob.ProgressListener() {
            @Override public void onProgress(@NonNull Progress progress) {
                // Notify our listeners about the download progress.
                progressSubject.onNext((double) progress.bytesReceived / (double) progress.totalBytes);
            }

            @Override public void onComplete(@NonNull File output) {
                progressSubject.onComplete();

                // Unblock the actual document loading.
                downloadLatch.countDown();
            }
            ...
        });
    }
}

Writeable Data Providers

If you want your custom DataProvider to also be writeable, you need to implement the WritableDataProvider interface. This tells the framework that your DataProvider also supports writing changes made to the data. Let’s look at the outline of how this would appear:

Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
class ExampleDataProvider : InputStreamDataProvider(), WritableDataProvider {

    ...

    // Tells the system we can write to this data provider.
    override fun canWrite(): Boolean = true

    override fun startWrite(writeMode: WritableDataProvider.WriteMode): Boolean {
        when (writeMode) {
            WritableDataProvider.WriteMode.REWRITE_FILE -> {
                // Prepare for writing, e.g. creating a new
                // temporary file to write to.

                ...

                // Return `true` to indicate we can proceed with writing.
                return true
            }
            WritableDataProvider.WriteMode.APPEND_TO_FILE -> {
                // This won't occur when returning `false` in
                // `supportsAppending`.
                return false
            }
        }
    }

    // This gets called repeatedly with the data we need to write.
    // Depending on the current write mode, either append
    // it to the existing data or write to a new file.
    override fun write(data: ByteArray): Boolean {

        ...

        // Return `true` to indicate we can proceed with writing.
        return true
    }

    // This is called once all data is written to give you an
    // opportunity to finish your writing process.
    override fun finishWrite(): Boolean {

        ...

        // Return `true` to indicate writing was successful.
        return true
    }

    // If you support appending data, you can return `true`. For this simple
    // example, we just return `false`.
    // Returning `true` doesn't mean it will always append;
    // you still need to support both write modes.
    override fun supportsAppending(): Boolean = false
}
Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
class ExampleDataProvider extends InputStreamDataProvider implements WritableDataProvider {

    ...

    @Override
    public boolean canWrite() {
        // Tells the system we can write to this data provider.
        return true;
    }

    @Override
    public boolean startWrite(WriteMode writeMode) {
        switch (writeMode) {
            case REWRITE_FILE:
                // Prepare for writing, e.g. creating a new
                // temporary file to write to.

                ...

                // Return `true` to indicate we can proceed with writing.
                return true;
            case APPEND_TO_FILE:
                // This won't occur when returning `false` in
                // `supportsAppending`.
                return false;
        }

        return false;
    }

    // This gets called repeatedly with the data we need to write.
    // Depending on the current write mode, either append
    // it to the existing data or write to a new file.
    @Override
    public boolean write(byte[] data) {

        ...

        // Return `true` to indicate we can proceed with writing.
        return true;
    }

    // This is called once all data is written to give you an
    // opportunity to finish your writing process.
    @Override
    public boolean finishWrite() {

        ...

        // Return `true` to indicate writing was successful.
        return true;
    }

    // If you support appending data you can return `true`. For this
    // simple example, we just return `false`.
    // Returning `true` doesn't mean it will always append;
    // you still need to support both write modes.
    @Override
    public boolean supportsAppending() {
        return false;
    }
}

For a complete example, check out AesDataProvider, which is part of our Catalog app.

More Uses for Data Providers

In certain cases, it can be beneficial to use a DataProvider not just for displaying documents, but also to write data. Certain APIs, such as XfdfFormatter and DocumentJsonFormatter, already allow you to use a DataProvider for the input. When using the OutputStreamAdapter, you can also use the same DataProvider for the output, so long as it implements WritableDataProvider. Let’s look at an example of how to store your XFDF data encrypted:

Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
val file = ...
val annotations = ...
val formFields = ...
// `AesDataProvider` is a sample data provider found in our Catalog app.
val dataProvider = AesDataProvider(file.canonicalPath, BASE64_ENCRYPTION_KEY)

// Export all annotations in the document to our data provider.
XfdfFormatter.writeXfdf(document,
    annotations,
    formFields,
    OutputStreamAdapter.Builder.fromDataProvider(dataProvider).build())

// You can use the same data provider for reimporting.
val parsedAnnotations = XfdfFormatter.parseXfdf(document, dataProvider)
Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
File file = ...
List<Annotation> annotations = ...
List<FormField> formFields = ...
// `AesDataProvider` is a sample data provider found in our Catalog app.
AesDataProvider dataProvider = new AesDataProvider(file.canonicalPath,
    BASE64_ENCRYPTION_KEY);

// Export all annotations in the document to our data provider.
XfdfFormatter.writeXfdf(getDocument(),
    annotations,
    formFields,
    OutputStreamAdapter.Builder.fromDataProvider(dataProvider).build());

// You can use the same data provider for reimporting.
List<Annotation> parsedAnnotations = XfdfFormatter.parseXfdf(getDocument(),
    dataProvider);

Writing Strategies

The OutputStreamAdapter can use a different WritingStrategy depending on the requirements of the given DataProvider. By default, we provide two WritingStrategy implementations:

  1. DirectWritingStrategy — This writes immediately to the DataProvider and is used by default.
  2. TempFileWritingStrategy — This writes to a temporary file and only commits it to the DataProvider once all data is ready. This is useful if the operation writing to your DataProvider is simultaneously reading from it.

Avoiding In-Memory Data Providers

In most cases, it is best to serve your PDF data from some kind of random access storage — usually a file. Keeping your PDF data solely in memory has two major disadvantages:

  • It can quickly lead to OutOfMemoryException situations. This can happen when you don’t have full control over the size of your PDFs (e.g. when a user opens a big PDF) or when running your app on a device with less memory than anticipated.

  • An in-memory data provider can’t be retained across process recreations. By design, Android can kill your app’s process as soon as all activities of your app are in the background. Once the app comes to the foreground, your process is recreated, leaving your in-memory data provider without data.

ℹ️ Note: Due to these reasons, the previously available MemoryDataProvider was removed in PSPDFKit 3.1.1 for Android. Depending on your original usage scenario, you are advised to use one of the other available data provider classes.