Migrate Existing Documents

When you already have a system in place to store your documents, it’s likely you won’t want to migrate all of your documents to PSPDFKit Server at once or build a wrapper to match your existing document IDs with the ones from our server.

Migrating from an Existing Document Storage

When all your PDFs are, for example, on Amazon S3 and you don’t want to reupload all of them to PSPDFKit Server, you can use our endpoint for adding a document from a URL. This will not cost you any more storage space on PSPDFKit Server, since we can use the file directly from S3. The document will still be cached to optimize performance.

Uploading Documents on Demand

You probably identify your documents with your own internal document IDs that map to files. You can reuse those document IDs on PSPDFKit Server when adding a new document. When a user requests one of the documents again, you can check on PSPDFKit Server to see if a document with this ID already exists. If this is not the case, you can upload the document with the same ID your user used for the document. Otherwise, you can serve them the document immediately.

As an example, let’s say you have a route like /documents/:id and your user requests a specific document with the ID my_document_identifier_1, which you internally have mapped to an S3 URL.

Now you can use the document info endpoint on PSPDFKit Server, pspdfkit-server.example.com/api/documents/my_document_identifier_1/document_info, and see if the document already exists. If this isn’t the case, you’ll receive a 404 error and call the adding a document from a URL endpoint with the following JSON:

  "url": "http://s3.amazon.com/sample.pdf",
  "document_id": "my_document_identifier_1"

After a successful request, PSPDFKit Server can now serve you the S3 document by using the ID my_document_identifier_1.

The next time one of your users requests the document with the ID my_document_identifier_1, you’ll receive a successful response from the document info endpoint and can serve your user the document immediately.

Accessing Your External Storage with Credentials or Tokens

When using S3, for example, you’re maybe using signed URLs with an expiration date to prevent accessing the documents after a certain time. Or, maybe you have your own storage that requires a username and password credentials to access the files.

This could also be the case if you’re using a WebDAV server that requires authentication with a username and password or any other storage solution that supports authentication via tokens.

When adding a document from a URL, this URL must provide PSPDFKit Server with the possibility of always downloading the PDF when it’s no longer in the cache. However, if you’re using signed URLs, this could be an issue because the token might be expired.

In this case, we recommend you use one of the following approaches.

Change the S3 Bucket Policy

If you’re using S3 with signed URLs, instead of providing PSPDFKit Server URLs with signed tokens and an expiration date, we recommend you change the S3 bucket policy. By allowing the IP where PSPDFKit Server is hosted, you will not need any signed URLs for extra protection, since PSPDFKit Server also doesn’t expose these URLs.

Have an Internal Endpoint That Redirects

When you still want to use signed URLs or need to add credentials to the URLs that you provide to PSPDFKit Server, we recommend you add an internal endpoint on your backend that redirects the URLs to the signed URLs. In the following code snippet, we have a Node.js server that receives requests at /documents/:document_id, where :document_id is the identifier you use in your application. It then generates the signed URL or URLs with any other credentials and redirects to this URL.

In this example, you’d provide PSPDFKit Server the URL https://yourapp.com/documents/myDocumentIdentifier1, and this endpoint would redirect to any signed URL, like s3.amazon.com/myDocument.pdf?token=123455:

// Catch the document endpoint.
app.get("/documents/:documentId", function(req, res) {
  // Here we generate the signed URL for Amazon S3 for the unique document ID that we were provided with and that will be used
  // by PSPDFKit Server.
  const preSignedUrl = generatePreSignedUrlForDocument(req.params.documentId);

The generatePreSignedUrlForDocument function used in the snippet above will generate the signed S3 URL with the expiry parameters. This will look similar to the function in the AWS guides’ Java example.