Redaction
Redaction is the process of removing content from a PDF page. This not only involves obscuring the content, but also removing the data in the document within the specified region.
Starting with version 2020.3, PSPDFKit for Web and PSPDFKit Server allow users to redact content and annotations from uploaded documents.
Redaction is a two-step process.
-
First, redaction annotations are added to the document’s layer, marking the content that should be removed.
-
Later, the redactions are applied to the layer, modifying the underlying PDF file and removing the redaction annotations permanently.
Just like any other annotation type, redaction annotations are automatically synchronized by our Instant engine, which facilitates effortless collaboration between users viewing the same layer. In addition, PSPDFKit Server provides a set of APIs that allow users to automate the process of creating redactions and applying them.
This guide covers the following:
Note that redaction is a separate component that requires a specific license feature. Without it, you won’t be able to import, create, or apply redactions to documents.
For a more general overview of this feature, see our Introduction to Redaction guide for PSPDFKit for Web.
Working with Redaction Annotations
Redaction annotations are regular annotations, which means they can be retrieved, created, updated, and deleted with the server annotation APIs.
Automatic Redaction Creation
Instead of manually creating redaction annotations in the layer, you can create them in batches based on the search criteria.
The request body has the following format (described using Flow type declarations):
{ strategy: "regex" | "preset" | "text", strategyOptions: object, user_id: ?string, content: ?{ fillColor: ?string, // default is "#000000" overlayText: ?string, // default is null repeatOverlayText: ?boolean, // default is false color: ?string, // default is "#F82400" outlineColor: ?string, // default is "#F82400" creatorName: ?string, // default is null customData: ?object } }
strategy
determines the batch-creation strategy: The shape of strategyOptions
depends on selected strategy. The available strategies are described in the following sections.
user_id
will be assigned as the owner of the created redaction annotations.
content
is an optional object that allows users to override the properties of created redaction annotations.
Request
Send a POST
request to /api/documents/:document_id/redactions
with a payload describing the creation strategy. This creates redaction annotations in the document’s default layer. To target a named layer, use the /api/documents/:document_id/layers/:layer_name/redactions
path:
Response
Regardless of the strategy used, the response contains all of the newly created redaction annotations. In case of success, the HTTP status 200 is returned:
HTTP/1.1 200 OK Content-Type: application/json { "data": { "annotations": [ { "id": "7KPS3BDVZ9A9G728TJ7TQKGE86", "content": { "bbox": [58, 61, 38, 18], "color": "#F82400", "createdAt": "2020-06-17T20:50:56.402729Z", "creatorName": null, "customData": null, "fillColor": "#000000", "opacity": 1, "outlineColor": "#F82400", "overlayText": null, "pageIndex": 0, "rects": [[ 58, 61, 38, 18]], "repeatOverlayText": false, "rotation": 0, "type": "pspdfkit/markup/redaction", "updatedAt": "2020-06-17T20:50:56.402729Z", "v": 1 }, "createdBy": "12345", "updatedBy": null, "group": null } ] } }
preset Strategy
The preset
strategy creates redaction annotations on top of both text and annotations, which match one of the predefined patterns:
{ strategy: "preset", strategyOptions: { preset: "credit-card-number" | "date" | "email-address" | "international-phone-number" | "ipv4" | "ipv6" | "mac-address" | "north-american-phone-number" | "social-security-number" | "time" | "url" | "us-zip-code" | "vin", includeAnnotations: ?boolean // default is true }, ... }
includeAnnotations
determines whether redactions should also be created on top of annotations that include the matching text.
Note that the provided presets are designed in such a way that they might find matches across different types of data. When you’re not sure about the results, review the redaction annotations visually before applying them. Or, if you need more control, use a text
or regex
strategy.
For a description of available presets, see the Search Presets guide.
A created redaction annotation covers the matching text exactly, or in case of annotations, the whole annotation’s bounding box:
Example request
POST /api/documents/my-document/redactions HTTP/1.1 Content-Type: application/json Authorization: Token token=<secret token> { "strategy": "preset", "strategyOptions": { "preset": "email" }, "user_id": "12345", "content": { "overlayText": "REDACTED" } }
curl -X POST http://localhost:5000/api/documents/my-document/redactions \ -H "Content-Type: application/json" -H "Authorization: Token token=<secret token>" -d '{"strategy": "preset", ... }'
regex Strategy
The regex
strategy creates redaction annotations on top of text and annotations, which match the provided regular expressions:
{ strategy: "regex", strategyOptions: { regex: string, includeAnnotations: ?boolean // default is true }, ... }
includeAnnotations
determines whether redactions should also be created on top of annotations that include the matching text.
The regular expression follows the ICU standard, which is described in detail here. In order to escape regex
control characters (e.g. “.” or “?”), you need to put a double backslash (”\”) in front of them.
If the regular expression is invalid, no redaction annotations are created.
A created redaction annotation covers the matching text exactly, or in case of annotations, the entire annotation’s bounding box:
Example request
POST /api/documents/my-document/redactions HTTP/1.1 Content-Type: application/json Authorization: Token token=<secret token> { "strategy": "preset", "strategyOptions": { "regex": ".*@pspdfkit\\.com" }, "user_id": "12345", "content": { "overlayText": "REDACTED" } }
curl -X POST http://localhost:5000/api/documents/my-document/redactions \ -H "Content-Type: application/json" -H "Authorization: Token token=<secret token>" -d '{"strategy": "regex", ... }'
text Strategy
The text
strategy creates redaction annotations on top of text and annotations, which match a literal search pattern:
{ strategy: "text", strategyOptions: { text: string, includeAnnotations: ?boolean // default is true }, ... }
The text
property inside strategyOptions
is a search query. includeAnnotations
determines whether redactions should also be created on top of annotations that include the matching text.
Note that the search query is case-insensitive.
A created redaction annotation covers the matching text exactly, or in case of annotations, the entire annotation’s bounding box:
Example request
POST /api/documents/my-document/redactions HTTP/1.1 Content-Type: application/json Authorization: Token token=<secret token> { "strategy": "text", "strategyOptions": { "text": "pspdfkit@pspdfkit.com" }, "user_id": "12345", "content": { "overlayText": "REDACTED" } }
curl -X POST http://localhost:5000/api/documents/my-document/redactions \ -H "Content-Type: application/json" -H "Authorization: Token token=<secret token>" -d '{"strategy": "text", ... }'
Applying Redactions
Redaction application is a process that permanently erases the text below the redaction annotations and deletes any annotation intersecting with redaction annotations. Similar to document editing, applying redactions creates a new PDF file that’s stored at the target layer.
Note that regardless of applied redactions, the content and annotations from the originally uploaded file are always stored at the document’s immutable base layer. Any time you create a new layer, that layer is a copy of the base layer, which means you can always recover the redacted content if needed. In some circumstances, e.g. due to legal requirements, this may be undesirable. In these cases, you can delete the document after applying redactions, which will erase all of the document’s data.
Request
To apply redactions in the document’s default layer, send a POST
request to /api/documents/:document_id/redact
. To target a named layer, use the /api/documents/:document_id/layers/:layer_name/redact
path:
POST /api/documents/:document_id/redact Authorization: Token token="<secret token>"
curl -X POST http://localhost:5000/api/documents/my-document/redact \ -H "Authorization: Token token=<secret token>"
Response
In case the document exists, the HTTP response with the status 200 and document properties of the PDF file with redactions applied are returned:
HTTP/1.1 200 OK Content-Type: application/json { "data": { "byteSize": ..., "passwordProtected": ..., "sourcePdfSha256": "...", "storage": {"type": "built-in"}, "title": "..." } }
Importing and Exporting
Redaction annotations are exported to the downloaded PDF file and will be visible in compatible PDF viewers. They are also automatically extracted from the uploaded PDFs.
Redactions are included in the Instant JSON files for the layer, and you can import them into the layer using these files as well.