PSPDFKit Instant and the Document State

Over the course of its lifetime, any InstantDocumentDescriptor will go through several states. This article covers those states and their transitions in more detail than the general documentation for this protocol.

To understand the different states of a document descriptor, it makes sense to revisit what these objects are and do.

A document descriptor acts as the receptionist for actual Document instances that are backed by the same data. In this role, the descriptor hands out document objects and is responsible for managing the authentication, download, and sync of this data. Document descriptors are managed objects that you cannot instantiate directly. Instead, you obtain them from an InstantClient, which returns one immediately if the identifier you passed isn’t obviously garbage.

As such, a document descriptor is not a promise that a usable document will exist in the future. Rather, it only gives you a lightweight handle to a document and an API to control and inspect its syncing behavior.

Lifecycle of a Document Descriptor

Each document descriptor is initially created with a documentState of .unknown, meaning the data that could back an actual document hasn’t necessarily been downloaded. In fact, it may not even exist at all.

In most cases, the state of a document descriptor will be established as either “clean” or “dirty” after you ask it for a document. However, this is only possible if the data for the document has already been downloaded. If you ask for an editable document, you can create, read, update, and delete (CRUD) annotations in this document. Making changes to annotations will result in the document descriptor being “dirty,” which begets a sync.

As written elsewhere, all syncing goes both ways: You cannot decide to just fetch or just push data! Instead, syncing always means that any local changes are sent to the server, which then decides a new truth. The server replies with the necessary changes for your local copy to know this new truth, which is then applied locally. For details about the state transitions this entails, please refer to the sync cycle section below.

When you’re no longer interested in the local data backing a document descriptor, you can tell the object to removeLocalStorage(). In addition, any ongoing network activity will be canceled immediately.

Calling removeLocalStorage() while the document descriptor is pushing changes to the server is possible, but it’s generally not recommended. When removeLocalStorage() returns, the document descriptor will be in an “unknown” state again, as if it had never been downloaded before.

ℹ️ Note: There can be multiple document descriptors with the same identifier — one for each layer. Because all layers with the same document identifier share the same PDF file, removeLocalStorage() doesn’t delete the backing file automatically.

To reclaim disk space for all PDF files that are no longer needed, you can call InstantClient.removeUnreferencedCacheEntries(). If you want to unconditionally reclaim the disk space for a certain PDF, you can call InstantClient.removeLocalStorage(forDocumentIdentifier:) instead. Doing so will also invalidate all existing document descriptors with a given identifier and remove their annotation data.

Because every document descriptor is managed by the InstantClient that created it, it cannot be used without a fully intact client. So when a client deallocates or is invalidated, all document descriptors managed by this client become “invalid” too.

Additionally, document descriptors become invalid when the client removes their local storage (either globally or selectively for a certain document identifier), and in the (unlikely) case when Instant detects that the backing data for a document descriptor has become corrupted.

In the dire situation of data corruption, there’s little you can do apart from calling removeLocalStorage() on the newly invalid document descriptor.

Downloading Document Data

If the data has not yet been downloaded, you can start a download operation by calling the descriptor’s download(usingJWT:) method. Should that operation fail, the descriptor’s documentState will remain unknown; it’s still possible that a layer for the document descriptor exists, but we just don’t know yet. Once the download succeeds, the descriptor’s state will be determined as .clean, and you can obtain fully usable document objects from the descriptor.

ℹ️ Note: Although you can already obtain Document instances from a document descriptor before the download operation has finished or even started, these documents are not fully usable. A Document obtained this way can be set as the document of any InstantViewController, which then displays a progress indicator until the download finishes. However, any attempt to create, read, update, or delete annotations is bound to fail while the download is in progress. The document will become fully usable if the download succeeds. However, it should be disposed of if the download fails.

As of PSPDFKit 7.6 for iOS, you can obtain all previously downloaded document descriptors by calling InstantClient.localDocumentDescriptors().

Use of Downloaded Document Descriptors

If the data backing a document descriptor has already been downloaded, the documentState will be determined when you try to obtain an editable or read-only document from the descriptor. A descriptor whose local data contains unsynced changes will report that it’s in .dirty, while a descriptor with local data that doesn’t contain changes will report that it’s in .clean.

If the data for a document descriptor has been downloaded before but has somehow been corrupted, obtaining the document will fail and the descriptor will report a documentState of .invalid. Should this ever happen, you can remove the local storage for that descriptor, which will then be in .unknown again. This allows you to attempt a fresh download of the annotation data — in this case, the PDF file doesn’t need to be downloaded again.

Once you’ve obtained an editable document from a downloaded document descriptor, you can start performing CRUD operations on its annotations and then sync them. By default, a document descriptor is configured to sync automatically a short while after you’ve made changes to its editable document. If you’ve disabled automatic syncing of local changes or you just don’t have local changes and want to fetch the newest server data, you can manually start a one-shot sync by calling InstantDocumentDescriptor.sync().

The Sync Cycle

When a document descriptor starts syncing, it posts a PSPDFInstantDidBeginSyncing notification and begins to cycle through several states until all local changes have been synced and the newest server truth has been applied, or until an error occurs. Depending upon the initial documentState of the descriptor, this notification is accompanied by a change to .sendingChanges if there were local changes. When it begins receiving the new server truth, a PSPDFInstantSyncCycleDidChangeState notification is posted and the documentState switches to .receivingChanges until this truth has been applied to the local database.

This can take any amount of time and is dependent upon various factors. It is therefore possible that new local changes will have been made when Instant applies the newest server truth. If there are no local changes after applying the new server truth, the sync cycle completes by posting a PSPDFInstantDidFinishSyncing notification and updating the documentState to .clean. If, on the other hand, there are unsynced local changes, the sync cycle will continue, a PSPDFInstantSyncCycleDidChangeState notification will be posted, and the documentState will be updated to .sendingChanges.

Errors during Sync

If an error occurs while a sync cycle is running, the cycle terminates immediately. In the case of an authentication failure — such as when the JWT expires — a PSPDFInstantDidFailAuthentication is posted, and the instantClient(_:didFailAuthenticationFor:) method is called on your client’s delegate. A new sync will be started after you make a successful call to reauthenticate(withJWT:).

In any other case, a PSPDFInstantDidFailSyncing notification is posted, and the instantClient(_:documentDescriptor:didFailSyncWithError:) method is called if your client’s delegate implements it. If the local database contains changes that have not yet been confirmed by the server, the documentState of the descriptor is updated to .dirty. Otherwise, the state is updated to .clean.

In the case of network errors, Instant will retry the sync operation using an exponential, jittered, backoff strategy. In case of authentication failures, no such reattempts will be made.