Search and Redact PDFs in C#
PSPDFKit GdPicture.NET Library enables you to search for text and permanently redact the occurrences from a document.
You cannot search and redact text in password-protected documents.
To search and redact text from a document, follow these steps:
-
Ensure that the document is searchable. For example, this means that the text in a scanned document has been recognized with an optical character recognition (OCR) tool. For more information, see the PDF to searchable PDF and the image to searchable PDF guides.
-
Create a
GdPicturePDF
object. -
Select the source PDF file by passing its path to the
LoadFromFile
method of theGdPicturePDF
object. -
Configure the redaction process with the
SearchAndAddRedactionRegions
method of theGdPicturePDF
object. This method takes the following parameters:-
Set the text to search for. Use plain text for exact matches, or use regular expressions to search for patterns.
-
To enable case-sensitive search, set the second parameter to
true
. -
The following four parameters set the color used for redaction with the RGBA standard. They are integers between 0 and 255. The first three numbers set the amount of red, green, and blue colors to use in the redacted areas. The fourth parameter sets the transparency of the color used in the redacted areas. For example, to use completely black color for redaction, set the values
0, 0, 0, 255
. -
Pass an integer variable as the last parameter. The number of occurrences found in the document is saved in this variable.
-
-
Run the redaction process with the
ApplyRedaction
method of theGdPicturePDF
object. -
Save the output in a PDF document with the
SaveToFile
method.
The example below loads a PDF document, removes occurrences of the text fragment “Sensitive Information,” and then saves the redacted file in a PDF:
using GdPicturePDF gdpicturePDF = new GdPicturePDF(); // Load the source document. gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf"); // Configure the redaction process. int occurrences = 0; gdpicturePDF.SearchAndAddRedactionRegions("Sensitive Information", true, 0, 0, 0, 255, ref occurrences); // Run the redaction process. if (occurrences > 0) { gdpicturePDF.ApplyRedaction(); } // Save the output in a PDF document. gdpicturePDF.SaveToFile(@"C:\temp\output.pdf");
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF() ' Load the source document. gdpicturePDF.LoadFromFile("C:\temp\source.pdf") ' Configure the redaction process. Dim occurrences = 0 gdpicturePDF.SearchAndAddRedactionRegions("Sensitive Information", True, 0, 0, 0, 255, occurrences) ' Run the redaction process. If occurrences > 0 Then gdpicturePDF.ApplyRedaction() End If ' Save the output in a PDF document. gdpicturePDF.SaveToFile("C:\temp\output.pdf") End Using
The example below loads a PDF document, recognizes text in the document using OCR, removes occurrences of the text fragment “Sensitive Information,” and then saves the redacted file in a PDF:
using GdPicturePDF gdpicturePDF = new GdPicturePDF(); // Load the source document. gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf"); // Determine the number of pages. int pageCount = gdpicturePDF.GetPageCount(); // Loop through the pages of the source document. for (int i = 1; i <= pageCount; i++) { // Select a page and run the OCR process on it. gdpicturePDF.SelectPage(i); gdpicturePDF.OcrPage("eng", @"C:\GdPicture.NET 14\Redist\OCR", "", 300); } // Configure the redaction process. int occurrences = 0; gdpicturePDF.SearchAndAddRedactionRegions("Sensitive Information", true, 0, 0, 0, 255, ref occurrences); // Run the redaction process. if (occurrences > 0) { gdpicturePDF.ApplyRedaction(); } // Save the output in a PDF document. gdpicturePDF.SaveToFile(@"C:\temp\output.pdf");
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF() ' Load the source document. gdpicturePDF.LoadFromFile("C:\temp\source.pdf") ' Determine the number of pages. Dim pageCount As Integer = gdpicturePDF.GetPageCount() ' Loop through the pages of the source document. For i = 1 To pageCount ' Select a page and run the OCR process on it. gdpicturePDF.SelectPage(i) gdpicturePDF.OcrPage("eng", "C:\GdPicture.NET 14\Redist\OCR", "", 300) Next ' Configure the redaction process. Dim occurrences = 0 gdpicturePDF.SearchAndAddRedactionRegions("Sensitive Information", True, 0, 0, 0, 255, occurrences) ' Run the redaction process. If occurrences > 0 Then gdpicturePDF.ApplyRedaction() End If ' Save the output in a PDF document. gdpicturePDF.SaveToFile("C:\temp\output.pdf") End Using