Redaction

Redaction is the process of removing content from a PDF page. This not only involves obscuring the content, but also removing the data in the document within the specified region.

Starting with version 2020.3, PSPDFKit for Web and PSPDFKit Server allow users to redact content and annotations from uploaded documents.

Redaction is a two-step process.

  • First, redaction annotations are added to the document’s layer, marking the content that should be removed.
  • Later, the redactions are applied to the layer, modifying the underlying PDF file and removing the redaction annotations permanently.

Just like any other annotation type, redaction annotations are automatically synchronized by our Instant engine, which facilitates effortless collaboration between users viewing the same layer. In addition, PSPDFKit Server provides a set of APIs that allow users to automate the process of creating redactions and applying them.

This guide covers the following:

Note that redaction is a separate component that requires a specific license feature. Without it, you won’t be able to import, create, or apply redactions to documents.

For a more general overview of this feature, see our Introduction to Redaction guide for PSPDFKit for Web.

Working with Redaction Annotations

Redaction annotations are regular annotations, which means they can be retrieved, created, updated, and deleted with the server annotation APIs.

Automatic Redaction Creation

Instead of manually creating redaction annotations in the layer, you can create them in batches based on the search criteria.

The request body has the following format (described using Flow type declarations):

Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
  strategy: "regex" | "preset" | "text",
  strategyOptions: object,
  user_id: ?string,
  content: ?{
    fillColor: ?string, // default is "#000000"
    overlayText: ?string, // default is null
    repeatOverlayText: ?boolean, // default is false
    color: ?string, // default is "#F82400"
    outlineColor: ?string, // default is "#F82400"
    creatorName: ?string,   // default is null
    customData: ?object
  }
}

strategy determines the batch-creation strategy: The shape of strategyOptions depends on selected strategy. The available strategies are described in the following sections.

user_id will be assigned as the owner of the created redaction annotations.

content is an optional object that allows users to override the properties of created redaction annotations.

Request

Send a POST request to /api/documents/:document_id/redactions with a payload describing the creation strategy. This creates redaction annotations in the document’s default layer. To target a named layer, use the /api/documents/:document_id/layers/:layer_name/redactions path:

Response

Regardless of the strategy used, the response contains all of the newly created redaction annotations. In case of success, the HTTP status 200 is returned:

Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
HTTP/1.1 200 OK
Content-Type: application/json

{
  "data": {
    "annotations": [
      {
        "content": {
          "bbox": [58, 61, 38, 18],
          ],
          "color": "#F82400",
          "createdAt": "2020-06-17T20:50:56.402729Z",
          "creatorName": null,
          "customData": null,
          "fillColor": "#000000",
          "opacity": 1,
          "outlineColor": "#F82400",
          "overlayText": null,
          "pageIndex": 0,
          "rects": [[ 58, 61, 38, 18]],
          "repeatOverlayText": false,
          "rotation": 0,
          "type": "pspdfkit/markup/redaction",
          "updatedAt": "2020-06-17T20:50:56.402729Z",
          "v": 1
        },
        "createdBy": "12345",
        "id": "7KPS3BDVZ9A9G728TJ7TQKGE86",
        "updatedBy": null
        }
    ]
  }
}

preset Strategy

The preset strategy creates redaction annotations on top of both text and annotations, which match one of the predefined patterns:

Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
  strategy: "preset",
  strategyOptions: {
    preset: "credit-card-number"
          | "date"
          | "email-address"
          | "international-phone-number"
          | "ipv4"
          | "ipv6"
          | "mac-address"
          | "north-american-phone-number"
          | "social-security-number"
          | "time"
          | "url"
          | "us-zip-code"
          | "vin",
    includeAnnotations: ?boolean // default is true
  },
  ...
}

includeAnnotations determines whether redactions should also be created on top of annotations that include the matching text.

Note that the provided presets are designed in such a way that they might find matches across different types of data. When you’re not sure about the results, review the redaction annotations visually before applying them. Or, if you need more control, use a text or regex strategy.

For a description of available presets, see here.

A created redaction annotation covers the matching text exactly, or in case of annotations, the whole annotation’s bounding box:

Example request

Copy
1
2
3
4
5
6
7
8
9
10
POST /api/documents/my-document/redactions HTTP/1.1
Content-Type: application/json
Authorization: Token token=<secret token>

{
  "strategy": "preset",
  "strategyOptions": { "preset": "email" },
  "user_id": "12345",
  "content": { "overlayText": "REDACTED" }
}
Copy
1
2
3
4
curl -X POST http://localhost:5000/api/documents/my-document/redactions \
  -H "Content-Type: application/json"
  -H "Authorization: Token token=<secret token>"
  -d '{"strategy": "preset", ... }'

regex Strategy

The regex strategy creates redaction annotations on top of text and annotations, which match the provided regular expressions:

1
2
3
4
5
6
7
8
{
  strategy: "regex",
  strategyOptions: {
    preset: string,
    includeAnnotations: ?boolean // default is true
  },
  ...
}

includeAnnotations determines whether redactions should also be created on top of annotations that include the matching text.

The regular expression follows the ICU standard, which is described in detail here. In order to escape regex control characters (e.g. “.” or “?”), you need to put a double backslash (”\”) in front of them.

If the regular expression is invalid, no redaction annotations are created.

A created redaction annotation covers the matching text exactly, or in case of annotations, the entire annotation’s bounding box:

Example request

Copy
1
2
3
4
5
6
7
8
9
10
POST /api/documents/my-document/redactions HTTP/1.1
Content-Type: application/json
Authorization: Token token=<secret token>

{
  "strategy": "preset",
  "strategyOptions": { "regex": ".*@pspdfkit\\.com" },
  "user_id": "12345",
  "content": { "overlayText": "REDACTED" }
}
Copy
1
2
3
4
curl -X POST http://localhost:5000/api/documents/my-document/redactions \
  -H "Content-Type: application/json"
  -H "Authorization: Token token=<secret token>"
  -d '{"strategy": "regex", ... }'

text Strategy

The text strategy creates redaction annotations on top of text and annotations, which match a literal search pattern:

1
2
3
4
5
6
7
8
{
  strategy: "text",
  strategyOptions: {
    text: string,
    includeAnnotations: ?boolean // default is true
  },
  ...
}

The text property inside strategyOptions is a search query. includeAnnotations determines whether redactions should also be created on top of annotations that include the matching text.

Note that the search query is case-insensitive.

A created redaction annotation covers the matching text exactly, or in case of annotations, the entire annotation’s bounding box:

Example request

Copy
1
2
3
4
5
6
7
8
9
10
POST /api/documents/my-document/redactions HTTP/1.1
Content-Type: application/json
Authorization: Token token=<secret token>

{
  "strategy": "text",
  "strategyOptions": { "text": "pspdfkit@pspdfkit.com" },
  "user_id": "12345",
  "content": { "overlayText": "REDACTED" }
}
Copy
1
2
3
4
curl -X POST http://localhost:5000/api/documents/my-document/redactions \
  -H "Content-Type: application/json"
  -H "Authorization: Token token=<secret token>"
  -d '{"strategy": "text", ... }'

Applying Redactions

Redaction application is a process that permanently erases the text below the redaction annotations and deletes any annotation intersecting with redaction annotations. Similar to document editing, applying redactions creates a new PDF file that’s stored at the target layer.

Note that regardless of applied redactions, the content and annotations from the originally uploaded file are always stored at the document’s immutable base layer. Any time you create a new layer, that layer is a copy of the base layer, which means you can always recover the redacted content if needed. In some circumstances, e.g. due to legal requirements, this may be undesirable. In these cases, you can delete the document after applying redactions, which will erase all of the document’s data.

Request

To apply redactions in the document’s default layer, send a POST request to /api/documents/:document_id/redact. To target a named layer, use the /api/documents/:document_id/layers/:layer_name/redact path:

1
2
POST /api/documents/:document_id/redact
Authorization: Token token="<secret token>"
1
2
curl -X POST http://localhost:5000/api/documents/my-document/redact \
  -H "Authorization: Token token=<secret token>"

Response

In case the document exists, the HTTP response with the status 200 and document properties of the PDF file with redactions applied are returned:

Copy
1
2
3
4
5
6
7
8
9
10
11
12
HTTP/1.1 200 OK
Content-Type: application/json

{
  "data": {
    "byteSize": ...,
    "passwordProtected": ...,
    "sourcePdfSha256": "...",
    "storage": {"type": "built-in"},
    "title": "..."
  }
}

Importing and Exporting

Redaction annotations are exported to the downloaded PDF file and will be visible in compatible PDF viewers. They are also automatically extracted from the uploaded PDFs.

Redactions are included in the Instant JSON files for the layer, and you can import them into the layer using these files as well.