Create a Document from a URL

When you already have an existing data store for your files or prefer not to store them with PSPDFKit Document Engine, you can create a document from a URL.

When operating on a document from a URL, PSPDFKit Document Engine will fetch the file using the provided URL and cache it in the node file system.

Your server sends a document’s URL to PSPDFKit Document Engine and receives a document ID back.

  1. Your service sends a document’s URL to PSPDFKit Document Engine, which makes a request to the URL to retrieve the document.

  2. The document service returns the document, and PSPDFKit Document Engine saves it and its metadata in the asset storage and PostgreSQL.

  3. Your service receives the document ID back, which it can use to reference the document later.

Security Considerations

Please be aware that the workflow of this functionality requires PSPDFKit Document Engine to perform a server-side retrieval of data at the specified URL. As such, creating a document from a URL comes with inherent security limitations.

For increased security at the expense of ease of integration, please consider disabling document creation from a URL and instead use the document creation from upload architecture to have data processed by PSPDFKit Document Engine sourced from a Postgres database or S3-compatible storage.

The design intent of the document creation from a URL feature is for PSPDFKit Document Engine to be able to easily integrate with other known and trusted services by reaching out and collecting data directly via HTTPS.

You should never send an untrusted URL directly to PSPDFKit Document Engine. If your service is working with user input or other untrusted data sources, your service needs to implement checks to prevent untrusted URLs from being directly sent to PSPDFKit Document Engine so as to mitigate the risk of server-side request forgery.

Because document creation from a URL requires an authorized API call, PSPDFKit Document Engine doesn’t place limitations on the network sources it’ll attempt to resolve and retrieve data from. As an exercise in the principle of least privilege, consider limiting outbound network traffic at the container or network firewall level so that PSPDFKit Document Engine can only communicate outbound to the sources it’s expected to retrieve data from.