Using RegEx Patterns to Redact PDFs in Java

To customize search data even further, PSPDFKit Libraries also supports searching via regular expressions. Regular expressions can be used either when you’re trying to match a word/phrase, or for more complicated expressions, such as when pattern matching complicated identification numbers. If you ever find that PSPDFKit Libraries is missing a preset, you can implement it as an expression:

RedactionProcessor.create()
    .addRedactionTemplates(new RedactionRegEx.Builder("A").build())
    .redact(document);

In the above example, the RedactionProcessor API will redact any instances of the uppercase character A.

It’s possible to take this further and define a complicated pattern for removal — for example, a four-digit code that’s followed by three letters:

RedactionProcessor.create()
    .addRedactionTemplates(new RedactionRegEx.Builder("\\d{4}[A-Za-z]{3}").build())
    .redact(document);

Please note that to provide any regular expression escape sequence, it’s necessary that you double escape the character in order for the literal string to be valid, just like with ‘\d’, which is equivalent to ‘\d’ in regular expressions.

Applying Redactions

All that’s required to apply the redaction annotations is to pass an option to the [save] API. From there, PSPDFKit will do the processing and redact all the relevant information:

document.save(new DocumentSaveOptions.Builder().applyRedactionAnnotations(true).build());