Automated Document Redaction in Linux

Information

PSPDFKit Processor has been deprecated and replaced by PSPDFKit Document Engine. All PSPDFKit Processor licenses will work as before and be supported until 15 May 2024 (we will contact you about license migration). To start using Document Engine, refer to the migration guide. With Document Engine, you’ll have access to robust new capabilities (read the blog for more information).

PSPDFKit Processor lets you create redactions on top of text matching predefined patterns, such as email addresses, URLs, and more. To create redactions, use the createRedactions action with the following preset strategy:

{
  "type": "createRedactions",
  "strategy": "preset",
  "strategyOptions": {
    "preset": "email-address"
  }
}

For a complete list of presets, see the API Reference.

Applying Redactions

After redaction annotations are created, they need to be applied to the document to effectively and permanently remove the covered content. You can achieve this by adding the applyRedactions action to the /build instructions.

Before you get started, make sure Processor is up and running.

You can download and use either of the following sample documents for the examples in this guide:

You’ll be sending multipart POST requests with instructions to Processor’s /build endpoint. To learn more about multipart requests, refer to our blog post on the topic, A Brief Tour of Multipart Requests.

Check out the API Reference to learn more about the /build endpoint and all the actions you can perform on PDFs with PSPDFKit Processor.

Creating and Applying Redactions in a File on Disk

Send a multipart request to the /build endpoint attached with the input file and the instructions JSON:

curl -X POST http://localhost:5000/build \
  -F file=@/path/to/example-document.pdf \
  -F instructions='{
  "parts": [
    {
      "file": "document",
      "actions": [
        {
          "type": "createRedactions",
          "strategy": "preset",
          "strategyOptions": {
            "preset": "email-address"
          }
        },
        {
          "type": "applyRedactions"
        }
      ]
    }
  ]
}' \
  -o result.pdf
POST /process HTTP/1.1
Content-Type: multipart/form-data; boundary=customboundary

--customboundary
Content-Disposition: form-data; name="file"; filename="example-document.pdf"
Content-Type: application/pdf

<PDF data>
--customboundary
Content-Disposition: form-data; name="instructions"
Content-Type: application/json

{
  "parts": [
    {
      "file": "document",
      "actions": [
        {
          "type": "createRedactions",
          "strategy": "preset",
          "strategyOptions": {
            "preset": "email-address"
          }
        },
        {
          "type": "applyRedactions"
        }
      ]
    }
  ]
}
--customboundary--

This creates redaction annotations and applies them to the file, removing the content beneath them.

Creating and Applying Redactions in a File from a URL

Send a request to the /build endpoint and include a URL pointing to the file you want to redact:

curl -X POST http://localhost:5000/build \
  -F instructions='{
  "parts": [
    {
      "file": {
        "url": "https://pspdfkit.com/downloads/examples/credit-card-application.pdf"
      },
      "actions": [
        {
          "type": "createRedactions",
          "strategy": "preset",
          "strategyOptions": {
            "preset": "email-address"
          }
        },
        {
          "type": "applyRedactions"
        }
      ]
    }
  ]
}' \
  -o result.pdf
POST /process HTTP/1.1
Content-Type: multipart/form-data; boundary=customboundary

--customboundary
Content-Disposition: form-data; name="instructions"
Content-Type: application/json

{
  "parts": [
    {
      "file": {
        "url": "https://pspdfkit.com/downloads/examples/credit-card-application.pdf"
      },
      "actions": [
        {
          "type": "createRedactions",
          "strategy": "preset",
          "strategyOptions": {
            "preset": "email-address"
          }
        },
        {
          "type": "applyRedactions"
        }
      ]
    }
  ]
}
--customboundary--

This creates redaction annotations and applies them to the file, removing the content beneath them.