Search and Redact PDFs in C#

PSPDFKit GdPicture.NET Library enables you to search for text and permanently redact the occurrences from a document.

You cannot search and redact text in password-protected documents.

To search and redact text from a document, follow these steps:

  1. Ensure that the document is searchable. For example, this means that the text in a scanned document has been recognized with an optical character recognition (OCR) tool. For more information, see the PDF to searchable PDF and the image to searchable PDF guides.

  2. Create a GdPicturePDF object.

  3. Select the source PDF file by passing its path to the LoadFromFile method of the GdPicturePDF object.

  4. Configure the redaction process with the SearchAndAddRedactionRegions method of the GdPicturePDF object. This method takes the following parameters:

    • Set the text to search for. Use plain text for exact matches, or use regular expressions to search for patterns.

    • To enable case-sensitive search, set the second parameter to true.

    • The following four parameters set the color used for redaction with the RGBA standard. They are integers between 0 and 255. The first three numbers set the amount of red, green, and blue colors to use in the redacted areas. The fourth parameter sets the transparency of the color used in the redacted areas. For example, to use completely black color for redaction, set the values 0, 0, 0, 255.

    • Pass an integer variable as the last parameter. The number of occurrences found in the document is saved in this variable.

  5. Run the redaction process with the ApplyRedaction method of the GdPicturePDF object.

  6. Save the output in a PDF document with the SaveToFile method.

The example below loads a PDF document, removes occurrences of the text fragment “Sensitive Information,” and then saves the redacted file in a PDF:

using GdPicturePDF gdpicturePDF = new GdPicturePDF();
// Load the source document.
gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf");
// Configure the redaction process.
int occurrences = 0;
gdpicturePDF.SearchAndAddRedactionRegions("Sensitive Information", true, 0, 0, 0, 255, ref occurrences);
// Run the redaction process.
if (occurrences > 0)
{
    gdpicturePDF.ApplyRedaction();
}
// Save the output in a PDF document.
gdpicturePDF.SaveToFile(@"C:\temp\output.pdf");
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()
    ' Load the source document.
    gdpicturePDF.LoadFromFile("C:\temp\source.pdf")
    ' Configure the redaction process.
    Dim occurrences = 0
    gdpicturePDF.SearchAndAddRedactionRegions("Sensitive Information", True, 0, 0, 0, 255, occurrences)
    ' Run the redaction process.
    If occurrences > 0 Then
        gdpicturePDF.ApplyRedaction()
    End If
    ' Save the output in a PDF document.
    gdpicturePDF.SaveToFile("C:\temp\output.pdf")
End Using

The example below loads a PDF document, recognizes text in the document using OCR, removes occurrences of the text fragment “Sensitive Information,” and then saves the redacted file in a PDF:

using GdPicturePDF gdpicturePDF = new GdPicturePDF();
// Load the source document.
gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf");
// Determine the number of pages.
int pageCount = gdpicturePDF.GetPageCount();
// Loop through the pages of the source document.
for (int i = 1; i <= pageCount; i++)
{
    // Select a page and run the OCR process on it.
    gdpicturePDF.SelectPage(i);
    gdpicturePDF.OcrPage("eng", @"C:\GdPicture.NET 14\Redist\OCR", "", 300);
}
// Configure the redaction process.
int occurrences = 0;
gdpicturePDF.SearchAndAddRedactionRegions("Sensitive Information", true, 0, 0, 0, 255, ref occurrences);
// Run the redaction process.
if (occurrences > 0)
{
    gdpicturePDF.ApplyRedaction();
}
// Save the output in a PDF document.
gdpicturePDF.SaveToFile(@"C:\temp\output.pdf");
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()
    ' Load the source document.
    gdpicturePDF.LoadFromFile("C:\temp\source.pdf")
    ' Determine the number of pages.
    Dim pageCount As Integer = gdpicturePDF.GetPageCount()
    ' Loop through the pages of the source document.
    For i = 1 To pageCount
        ' Select a page and run the OCR process on it.
        gdpicturePDF.SelectPage(i)
        gdpicturePDF.OcrPage("eng", "C:\GdPicture.NET 14\Redist\OCR", "", 300)
    Next
    ' Configure the redaction process.
    Dim occurrences = 0
    gdpicturePDF.SearchAndAddRedactionRegions("Sensitive Information", True, 0, 0, 0, 255, occurrences)
    ' Run the redaction process.
    If occurrences > 0 Then
        gdpicturePDF.ApplyRedaction()
    End If
    ' Save the output in a PDF document.
    gdpicturePDF.SaveToFile("C:\temp\output.pdf")
End Using
Used Methods

Related Topics