Understanding the Document State

Over the course of its lifetime, any PSPDFInstantDocumentDescriptor will go through several states. This article covers those states and their transitions in more detail than the general documentation for this protocol.

In order to understand the different states of a document descriptor, it makes sense to revisit what these objects are and do.

A document descriptor acts as the receptionist for actual PSPDFDocument instances that are backed by the same data. In this role, the descriptor hands out document objects and is responsible for managing the authentication, download, and sync of this data. Document descriptors are managed objects that you cannot instantiate directly. Instead, you obtain them from a PSPDFInstantClient, which returns one immediately if the identifier you passed isn’t obviously garbage.

As such, a document descriptor is not a promise that a usable document will exist in the future. Rather, it only gives you a lightweight handle to a document and an API to control and inspect its syncing behavior.

Lifecycle of a Document Descriptor

Each document descriptor is initially created with a documentState of PSPDFInstantDocumentStateUnknown, meaning the data that could back an actual document has not necessarily been downloaded. In fact, it may not even exist at all.

In most cases, the state of a document descriptor will be established as either “clean” or “dirty” after you ask it for a document. However, this is only possible if the data for the document has already been downloaded. If you ask for an editable document, you can create, read, update, and delete (CRUD) annotations in this document. Making changes to annotations will result in the document descriptor being “dirty,” which begets a sync.

As written elsewhere, all syncing is two-way: You cannot decide to just fetch or just push data! Instead, syncing always means that any local changes are sent to the server, which then decides a new truth. The server replies with the necessary changes for your local copy to know this new truth, which is then applied locally. For details about the state transitions this entails, please refer to the sync cycle section below.

When you are no longer interested in the local data backing a document descriptor, you can tell the object to removeLocalStorageWithError:. In addition, any ongoing network activity will be canceled immediately.

Calling removeLocalStorageWithError: while the document descriptor is pushing changes to the server is possible, but it’s generally not recommended. When removeLocalStorageWithError: returns, the document descriptor will be in an “unknown” state again, as if it had never been downloaded before.

Note: As of PSPDFKit 7.6 for iOS, there can be multiple document descriptors with the same identifier — one for each layer. Because all layers with the same document identifier share the same PDF file, removeLocalStorageWithError: no longer deletes the backing file automatically.

In order to reclaim disk space for all PDF files that are no longer needed, you can call -[PSPDFInstantClient removeUnreferencedCacheEntries:]. If you want to unconditionally reclaim the disk space for a certain PDF, you can call -[PSPDFInstantClient removeLocalStorageForDocumentIdentifier:error:] instead. Doing so will also invalidate all existing document descriptors with a given identifier and remove their annotation data.

Because every document descriptor is managed by the PSPDFInstantClient that created it, it cannot be used without a fully intact client. So when a client deallocates or is invalidated, all document descriptors managed by this client become “invalid” too.

Additionally, document descriptors become invalid when the client removes their local storage (either globally or selectively for a certain document identifier), and in the (unlikely) case when Instant detects that the backing data for a document descriptor has become corrupted.

In the dire situation of data corruption, there is little you can do apart from calling removeLocalStorageWithError: on the newly invalid document descriptor.

Downloading Document Data

If the data has not yet been downloaded, you can start a download operation by calling the descriptor’s downloadUsingJWT:error: method. Should that operation fail, the descriptor’s documentState will remain unknown; it is still possible that a layer for the document descriptor exists, but we just don’t know yet. Once the download succeeds, the descriptor’s state will be determined as PSPDFInstantDocumentStateClean, and you can obtain fully usable document objects from the descriptor.

Note: Although you can already obtain PSPDFDocument instances from a document descriptor before the download operation has finished or even started, these documents are not fully usable. A PSPDFDocument obtained this way can be set as the document of any PSPDFInstantViewController, which then displays a progress indicator until the download finishes. However, any attempt to create, read, update, or delete annotations is bound to fail while the download is in progress. The document will become fully usable if the download succeeds. However, it should be disposed of if the download fails.

As of PSPDFKit 7.6 for iOS, you can obtain all previously downloaded document descriptors by calling -[PSPDFInstantClient localDocumentDescriptors:].

Use of Downloaded Document Descriptors

If the data backing a document descriptor has already been downloaded, the documentState will be determined when you try to obtain an editable or read-only document from the descriptor. A descriptor whose local data contains unsynced changes will report that it is in PSPDFInstantDocumentStateDirty, while a descriptor with local data that does not contain changes will report that it is in PSPDFInstantDocumentStateClean.

If the data for a document descriptor has been downloaded before but has somehow been corrupted, obtaining the document will fail and the descriptor will report a documentState of PSPDFInstantDocumentStateInvalid. Should this ever happen, you can remove the local storage for that descriptor, which will then be in PSPDFInstantDocumentStateUnknown again. This allows you to attempt a fresh download of the annotation data — in this case, the PDF file does not need to be downloaded again.

Once you have obtained an editable document from a downloaded document descriptor, you can start to perform CRUD operations on its annotations and then sync them. By default, a document descriptor is configured to sync automatically a short while after you’ve made changes to its editable document. If you have disabled automatic syncing of local changes or you just don’t have local changes and want to fetch the newest server data, you can manually start a one-shot sync by calling -[PSPDFInstantDocumentDescriptor-sync].

The Sync Cycle

When a document descriptor starts syncing, it posts a PSPDFInstantDidBeginSyncingNotification notification and begins to cycle through several states until all local changes have been synced and the newest server truth has been applied, or until an error occurs. Depending upon the initial documentState of the descriptor, this notification is accompanied by a change to either PSPDFInstantDocumentStatePushingChanges if there were local changes, or PSPDFInstantDocumentStateFetchingChanges if there were not. When it begins receiving the new server truth, a PSPDFInstantSyncCycleDidChangeStateNotification notification is posted and the documentState switches to PSPDFInstantDocumentStateReceivingChanges until this truth has been applied to the local database.

This can take any amount of time and is dependent upon various factors. It is therefore possible that new local changes will have been made when Instant applies the newest server truth. If there are no local changes after applying the new server truth, the sync cycle completes by posting a PSPDFInstantDidFinishSyncingNotification notification and updating the documentState to PSPDFInstantDocumentStateClean. If, on the other hand, there are unsynced local changes, the sync cycle will continue, a PSPDFInstantSyncCycleDidChangeStateNotification notification will be posted, and the documentState will be updated to PSPDFInstantDocumentStatePushingChanges.

Errors during Sync

If an error occurs while a sync cycle is running, the cycle terminates immediately. In the case of an authentication failure — such as when the JWT expires — a PSPDFInstantDidFailAuthenticationNotification is posted, and the instantClient:didFailAuthenticationForDocumentDescriptor: method is called on your client’s delegate. A new sync will be started after you make a successful call to reauthenticateWithJWT:.

In any other case, a PSPDFInstantDidFailSyncingNotification notification is posted and the instantClient:documentDescriptor:didFailSyncWithError: method is called if your client’s delegate implements it. If the local database contains changes that have not yet been confirmed by the server, the documentState of the descriptor is updated to PSPDFInstantDocumentStateDirty. Otherwise, the state is updated to PSPDFInstantDocumentStateClean.

In the case of network errors, Instant will retry the sync operation using an exponential, jittered, backoff strategy. In case of authentication failures, no such reattempts will be made.