Asset Storage

PSPDFKit Server supports multiple storage backends for PDFs and other assets, as detailed below.

Built-In Asset Storage

By default, PSPDFKit Server stores assets as Binary Large OBjects (BLOBs) in the database. If you have individual PDFs that are bigger than 1 GB in size, we recommend using S3-compatible object storage.

Set ASSET_STORAGE_BACKEND to built-in to use the built-in asset storage.

S3-Compatible Object Storage

PSPDFKit Server can also store your assets in any Amazon S3-compatible object storage service.

Set ASSET_STORAGE_BACKEND to S3 and use ASSET_STORAGE_S3_BUCKET, ASSET_STORAGE_S3_ACCESS_KEY_ID, ASSET_STORAGE_S3_SECRET_ACCESS_KEY, and ASSET_STORAGE_S3_REGION to determine how PSPDFKit Server can access the external storage.

When using an object storage provider other than Amazon S3, you can set the ASSET_STORAGE_S3_SCHEME, ASSET_STORAGE_S3_HOST, and ASSET_STORAGE_S3_PORT.

For more details about using Google Cloud Storage as the storage backend, take a look at the Google Cloud Storage interoperability guide.

Docker Volume (Deprecated)

Storing assets in a (local) Docker volume was the default in older versions of PSPDFKit Server, but this was deprecated in 2017.7 and will be removed in a future version of PSPDFKit Server.

We recommend updating to the built-in asset storage.

To upgrade to the built-in asset storage, see the migration section.

With this backend, all assets are stored locally at the path configured in ASSET_STORAGE_PATH.

Make sure to mount this path as a Docker volume. Otherwise, recreating your Docker container will destroy all uploaded PDFs and other assets!

All options for the storage backend are set with environment variables.

Which Storage Backend Should I Use?

The choice of storage backend depends on the PDF dataset that will power your application, and it impacts the general performance of PSPDFKit Server.

If you have a relatively stable number of PDF files (i.e. an amount that only changes a few times a month) with a size of lower than 5 MB each, you can safely use the built-in storage, with the main advantages being that:

  • You don’t have to worry about another piece of infrastructure.
  • Backing up the PSPDFKit Server PostgreSQL instance will also back up your assets.

For larger and more frequently changing files, we recommend using the S3-compatible asset storage backend, which provides more efficient support for concurrent uploads and downloads.

Using the S3-compatible backend means you need a separate backup routine, but you should consider that:

  • As PSPDFKit Server stores files by their SHA checksum, most of the time, a daily, incremental backup will suffice.
  • You should schedule the asset storage backup right after the PostgreSQL database backup in order to avoid data drifting between the two.

Serving Files from Existing Storage in Your Infrastructure

If you already have a storage solution for PDF files in your infrastructure, PSPDFKit Server can integrate with it as long as the PDF files can be accessed via an HTTP endpoint. When integrating the PSPDFKit Server and the file storage, you will need to add documents from a url.

All PDF URLs should be considered permalinks, as PSPDFKit will always fetch the file when needed (keeping only a local cached copy that can expire at any time).

To achieve the best possible performance, please make sure that PSPDFKit Server instances and the file store sit in the same network (physical or virtual). This minimizes latency and maximizes download speed.

As of version 2019.4, it’s possible to perform a document editing operation on a document with a remote URL, but the resulting PDF file will need to be stored with any of the supported storage strategies. If you need to copy the transformed file back to the file store, you will need to do that manually by fetching the transformed file first.

If your file store requires authentication, we recommend introducing an internal proxy. When adding a document with a URL, the URL would point to the proxy endpoint, where your custom logic would be able to support the required authentication options and redirect to the file store URL of the PDF file. For more information and some sample code, you can visit the relevant guide article.

MinIO

Our recommended solution when using an S3-compatible object storage in production is to use MinIO in development, in order to get closer to dev/prod parity.

To run the MinIO Docker container, run the following:

1
2
docker pull minio/minio
docker run -p 9000:9000 minio/minio server /export

After running these commands, you should see the AccessKey and SecretKey printed out in the terminal, which you can use to access the MinIO web interface at http://localhost:9000/minio.

You can now configure docker-compose.yml, like this:

Copy
1
2
3
4
5
6
7
8
environment:
  ASSET_STORAGE_BACKEND: S3
  ASSET_STORAGE_S3_BUCKET: <minio bucket name>
  ASSET_STORAGE_S3_ACCESS_KEY_ID: <minio access key>
  ASSET_STORAGE_S3_SECRET_ACCESS_KEY: <minio secret access key>
  ASSET_STORAGE_S3_SCHEME: http://
  ASSET_STORAGE_S3_HOST: pssync_minio
  ASSET_STORAGE_S3_PORT: 9000

MinIO supports emulating different regions. It defaults to us-east-1. If you have changed your MinIO configuration to a different region, make sure to set ASSET_STORAGE_S3_REGION accordingly.

Migration between Asset Storage Options

It is possible to migrate from one storage backend to another one by executing the migration command as described below. To prevent data loss, a migration does not delete files from the original storage backend.

Warning

When doing these migrations, please make sure you disable access to the server while the migration is in progress, in order to prevent race conditions where data gets stored in the old backend before the server gets restarted with the new asset storage configuration. You can avoid this problem entirely by stopping the server before you do the migrations.

Migrating with Docker Compose

Use the following commands if you want to migrate your asset storage to another one and you use docker-compose to run your application.

Migrating to Built-In Storage from Docker Volume (from Older Versions of PSPDFKit Server)

To migrate from a local Docker volume to built-in asset storage, make sure you have set ASSET_STORAGE_PATH to the path where the assets currently reside.

In the same directory, in the place where you have your docker-compose file, run the following:

1
docker-compose run pspdfkit pspdfkit assets:migrate:from-local-to-built-in

Migrating to S3 from Built-In Storage

To migrate from the built-in asset storage to S3, make sure you have set all S3 options.

In the same directory, in the place where you have your docker-compose file, run the following:

1
docker-compose run pspdfkit pspdfkit assets:migrate:from-built-in-to-s3

Migrating to Built-In Storage from S3

To migrate from S3 asset storage to built-in storage, make sure you have set all S3 options.

In the same directory, in the place where you have your docker-compose file, run the following:

1
docker-compose run pspdfkit pspdfkit assets:migrate:from-s3-to-built-in

Migrating without Docker Compose

Use the following commands if you want to migrate your asset storage to another one and you don’t use docker-compose to run your application. Make sure to replace <container-name> with the name of your Docker container in the migration command. To list all containers and their names on your machine, run the following:

1
docker ps -a

Migrating to Built-In Storage from Docker Volume (from Older Versions of PSPDFKit Server)

To migrate from a local Docker volume to built-in asset storage, make sure you have set ASSET_STORAGE_PATH to the path where the assets currently reside:

1
docker exec <container-name> pspdfkit assets:migrate:from-local-to-built-in

Migrating to S3 from Built-In Storage

To migrate from the built-in asset storage to S3, make sure you have set all S3 options:

1
docker exec <container-name> pspdfkit assets:migrate:from-built-in-to-s3

Migrating to Built-In Storage from S3

To migrate from S3 asset storage to built-in storage, make sure you have set all S3 options:

1
docker exec <container-name> pspdfkit assets:migrate:from-s3-to-built-in