Redaction

Redaction is the process of removing content from a PDF page. This not only involves obscuring the content, but it also involves irretrievably removing the data in the document within the specified region.

Redaction is a two-step process.

  • First, redaction annotations have to be created in the areas that should be redacted. This step won’t remove any content from the document yet; it just marks regions for redaction.
  • Second, to actually remove the content, the redaction annotations need to be applied. In this step, the page content within the region of the redaction annotations is irreversibly removed.

For further in-depth information, check out the official Adobe document, PDF Redaction: Addendum for the PDF Reference.

PSPDFKit Libraries 1.0+ is able to redact text, images, annotations, form fields, and paths/vector drawings. And as of PSPDFKit Libraries 1.1, the powerful RedactionProcessor API can be used to identify data on a document that should be redacted.

Automated Redaction

In some use cases, you might want to process a batch of documents and remove similar information across all of them. In such a scenario, it’d be useful to set up the “shape” of the redaction and apply it to many documents. This shape could be a simple text match, or it could be some preset search algorithm to identify personal information — for example, phone numbers.

Data Identification Methods

Preset Search Algorithm

PSPDFKit Libraries provides preset search algorithms out of the box to allow simple setup of a search “shape” and redaction across multiple documents. These presets include pieces of information like social security numbers, phone numbers, zip codes, credit card numbers, and dates.

To apply a redaction to a document, create a redaction processor, add the redaction preset you’d like to remove, and redact the document:

Copy
1
2
3
RedactionProcessor.create()
    .addRedactionTemplates(new RedactionPreset.Builder(RedactionPreset.Type.EMAIL_ADDRESS).build())
    .redact(document);

Regular Expression Search

To customize search data even further, PSPDFKit Libraries also supports searching via regular expressions. Regular expressions can be used either when you are trying to match a word/phrase, or for more complicated expressions, such as when pattern matching complicated identification numbers. If you ever find that PSPDFKit Libraries is missing a preset, it’s easy to implement an expression:

1
2
3
RedactionProcessor.create()
    .addRedactionTemplates(new RedactionRegEx.Builder("A").build())
    .redact(document);

In the above example, the RedactionProcessor API will redact any instances of the uppercase character A.

It’s possible to take this further and define a complicated pattern for removal — for example, a four-digit code that’s followed by three letters:

1
2
3
RedactionProcessor.create()
    .addRedactionTemplates(new RedactionRegEx.Builder("\\d{4}[A-Za-z]{3}").build())
    .redact(document);

Please note that to provide any regular expression escape sequence, it’s necessary that you double escape the character in order for the literal string to be valid, just like with ‘\d’, which is equivalent to ‘\d’ in regular expressions.

Applying or Adding Redaction Annotations

In some use cases, it would be beneficial to add redaction annotations but not redact the data while saving. This would make it possible to, for example, have a document go through a human review process, adjust annotation sizes and shapes, or add extra redaction annotations. To do this, it just takes a different call to the RedactionProcessor API:

Copy
1
2
3
RedactionProcessor.create()
    .addRedactionTemplates(new RedactionRegEx.Builder("A").build())
    .identifyAndAddRedactionAnnotations(document);

Rather than calling redact, we call identifyAndAddRedactionAnnotations, which only adds the annotations to the document. It is then the responsibility of the user to save the file with the save API.

Creating Redaction Annotations Manually

Programmatically

You can create redactions programmatically via addAnnotationJson. The shape of the JSON passed to create the annotation can be found in the JSON Format guide. In short, an array of rectangles is required to define the regions that should be covered by the redaction annotation.

You also have a few customization options for how a redaction should look, both while in its marked state, which is when the redaction has been created but not yet applied, and in its redacted state, which is when the redaction has been applied. It is not possible to change the appearance once a redaction has been applied, since the redaction annotation will be removed from the document in the process of applying the redactions. Here’s a list of available customization options for redactions:

  • overlayText can be used to set the text that should be displayed at the specified region when a redaction has been applied.
  • repeatOverlayText defines whether the overlay text should be drawn only once or repeated to fill the entire redaction area. This defaults to disabled, which means the overlay text is only drawn once. It has no effect if there is no overlay text specified.
  • color can be used to change the color of the overlay text. It has no effect if there is no overlay text specified. This defaults to red, #ff0000.
  • fillColor specifies the background color of the redaction area after it has been applied. The color is drawn on all the specified rects. This defaults to black, #000000.
  • outlineColor specifies the color used for the redaction’s border in its marked state. This defaults to red, #ff0000.
Copy
1
2
3
4
5
6
7
8
9
10
JSONObject redactionAnnotation = new JSONObject();
redactionAnnotation.put("bbox", new float[]{10, 10, 200, 100});
redactionAnnotation.put("creatorName", "User");
redactionAnnotation.put("fillColor", "#000000");
redactionAnnotation.put("opacity", 1);
redactionAnnotation.put("pageIndex", 0);
redactionAnnotation.put("type", "pspdfkit/markup/redaction");
redactionAnnotation.put("v", 1);

document.getAnnotationProvider().addAnnotationJson(redactionAnnotation);

Instant JSON

It’s also possible to design how the redactions would look on one of our powerful UI frameworks (iOS/Android), export this redaction information, and apply the redactions with PSPDFKit Libraries. To create the redaction information, Instant Document JSON (iOS/Android) is used. It can then be imported into the document on the server, and the redactions can be applied:

1
2
File redactionJsonFile = new File("Assets/redaction.json");
document.importDocumentJson(new FileDataProvider(redactionJsonFile));

Applying Redactions

All that is required to apply the created annotation is to pass an option to the save API. From there, PSPDFKit will do the processing and redact all the relevant information:

1
document.save(new DocumentSaveOptions.Builder().applyRedactionAnnotations(true).build());

Licensing

Redaction is a feature that has to be licensed. The following list describes the expected behavior if Redaction is not part of your license:

  • It will not be possible to create a redaction annotation.
  • Any Instant JSON with redaction annotation information will fail to be applied.
  • Any requested applyRedactionAnnotations will fail upon saving.