C# OCR Image to Text

This guide explains how to convert image files to searchable PDFs. GdPicture.NET’s optical character recognition (OCR) engine allows you to recognize the text in an image file and then save the text in a PDF.

Converting Image Files to Searchable PDFs

This section explains how to convert simple, single-page image files. For more information on converting multipage image files, see Converting Multipage TIFF Files to Searchable PDFs.

To convert an image file to a searchable PDF, follow these steps.

  1. Create a GdPicturePDF object, a GdPictureImaging object, and a GdPictureOCR object.

  2. Select the image by passing its path to the CreateGdPictureImageFromFile method of the GdPictureImaging object.

  3. Configure the OCR process with the GdPictureOCR object in the following way:

    • Set the image with the SetImage method.

    • Set the path to the OCR resource folder with the ResourceFolder property. The default language resources are located in GdPicture.NET 14\Redist\OCR. For more information on adding language resources, see the language support guide.

    • With the AddLanguage method, add the language resources that GdPicture.NET uses to recognize text in the image. This method takes a member of the OCRLanguage enumeration.

    • Optional: Set whether OCR prioritizes recognition accuracy or speed with the OCRMode property.

    • Optional: Set the character allowlist with the CharacterSet property. When scanning the image, the OCR engine only recognizes the characters included in the allowlist.

    • Optional: Set the character denylist with the CharacterBlackList property. When scanning the image, the OCR engine ignores the characters included in the denylist.

  4. Run the OCR process with the RunOCR method of the GdPictureOCR object.

  5. Get the result of the OCR process as text with the GetOCRResultText method of the GdPictureOCR object.

  6. Create the output with the CreateFromText method. The first parameter sets the conformance level of the PDF document. This parameter is a member of the PdfConformance enumeration. For example, use PDF to create a common PDF document.

  7. Save the output in a PDF document.

The example below converts an image file to a searchable PDF by specifying the language of the text:

using GdPicturePDF gdpicturePDF = new GdPicturePDF();
using GdPictureImaging gdpictureImaging = new GdPictureImaging();
using GdPictureOCR gdpictureOCR = new GdPictureOCR();
// Select the image to process.
int imageID = gdpictureImaging.CreateGdPictureImageFromFile(@"C:\temp\source.png");
// Set the OCR parameters.
gdpictureOCR.SetImage(imageID);
gdpictureOCR.ResourceFolder = @"C:\GdPicture.NET 14\Redist\OCR";
gdpictureOCR.AddLanguage(OCRLanguage.English);
// Run the OCR process.
string resID = gdpictureOCR.RunOCR();
// Get the result of the OCR process as text.
string content = gdpictureOCR.GetOCRResultText(resID);
// Save the result in a PDF document.
gdpicturePDF.CreateFromText(PdfConformance.PDF, 595, 842, 10, 10, 10, 10,
TextAlignment.TextAlignmentNear, content, 12, "Arial", false, false, true, false);
gdpicturePDF.SaveToFile(@"C:\temp\output.pdf");
gdpictureImaging.ReleaseGdPictureImage(imageID);
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()
Using gdpictureImaging As GdPictureImaging = New GdPictureImaging()
Using gdpictureOCR As GdPictureOCR = New GdPictureOCR()
    ' Select the image to process.
    Dim imageID As Integer = gdpictureImaging.CreateGdPictureImageFromFile("C:\temp\source.png")
    ' Set the OCR parameters.
    gdpictureOCR.SetImage(imageID)
    gdpictureOCR.ResourceFolder = "C:\GdPicture.NET 14\Redist\OCR"
    gdpictureOCR.AddLanguage(OCRLanguage.English)
    ' Run the OCR process.
    Dim resID As String = gdpictureOCR.RunOCR()
    ' Get the result of the OCR process as text.
    Dim content As String = gdpictureOCR.GetOCRResultText(resID)
    ' Save the result in a PDF document.
    gdpicturePDF.CreateFromText(PdfConformance.PDF, 595, 842, 10, 10, 10, 10, TextAlignment.TextAlignmentNear, content, 12, "Arial", False, False, True, False)
    gdpicturePDF.SaveToFile("C:\temp\output.pdf")
    gdpictureImaging.ReleaseGdPictureImage(imageID)
End Using
End Using
End Using

The example below converts an image file to a searchable PDF. It specifies two languages, favors speed over accuracy, and disregards numbers when scanning the image:

using GdPicturePDF gdpicturePDF = new GdPicturePDF();
using GdPictureImaging gdpictureImaging = new GdPictureImaging();
using GdPictureOCR gdpictureOCR = new GdPictureOCR();
// Select the image to process.
int imageID = gdpictureImaging.CreateGdPictureImageFromFile(@"C:\temp\source.png");
// Set the OCR parameters.
gdpictureOCR.SetImage(imageID);
gdpictureOCR.ResourceFolder = @"C:\GdPicture.NET 14\Redist\OCR";
gdpictureOCR.AddLanguage(OCRLanguage.English);
gdpictureOCR.AddLanguage(OCRLanguage.German);
gdpictureOCR.OCRMode = OCRMode.FavorSpeed;
gdpictureOCR.CharacterBlackList = "0123456789";
// Run the OCR process.
string resID = gdpictureOCR.RunOCR();
// Get the result of the OCR process as text.
string content = gdpictureOCR.GetOCRResultText(resID);
// Save the result in a PDF document.
gdpicturePDF.CreateFromText(PdfConformance.PDF, 595, 842, 10, 10, 10, 10, TextAlignment.TextAlignmentNear, content, 12, "Arial", false, false, true, false);
gdpicturePDF.SaveToFile(@"C:\temp\output.pdf");
gdpictureImaging.ReleaseGdPictureImage(imageID);
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()
Using gdpictureImaging As GdPictureImaging = New GdPictureImaging()
Using gdpictureOCR As GdPictureOCR = New GdPictureOCR()
    ' Select the image to process.
    Dim imageID As Integer = gdpictureImaging.CreateGdPictureImageFromFile("C:\temp\source.png")
    ' Set the OCR parameters.
    gdpictureOCR.SetImage(imageID)
    gdpictureOCR.ResourceFolder = "C:\GdPicture.NET 14\Redist\OCR"
    gdpictureOCR.AddLanguage(OCRLanguage.English)
    gdpictureOCR.AddLanguage(OCRLanguage.German)
    gdpictureOCR.OCRMode = OCRMode.FavorSpeed
    gdpictureOCR.CharacterBlackList = "0123456789"
    ' Run the OCR process.
    Dim resID As String = gdpictureOCR.RunOCR()
    ' Get the result of the OCR process as text.
    Dim content As String = gdpictureOCR.GetOCRResultText(resID)
    ' Save the result in a PDF document.
    gdpicturePDF.CreateFromText(PdfConformance.PDF, 595, 842, 10, 10, 10, 10, TextAlignment.TextAlignmentNear, content, 12, "Arial", False, False, True, False)
    gdpicturePDF.SaveToFile("C:\temp\output.pdf")
    gdpictureImaging.ReleaseGdPictureImage(imageID)
End Using
End Using
End Using
Used Methods and Properties

Related Topics

Converting Multipage TIFF Files to Searchable PDFs

To convert an image file to a searchable PDF, follow these steps.

  1. Create a GdPicturePDF object, a GdPictureImaging object, and a GdPictureOCR object.

  2. Select the image by passing its path to the TiffCreateMultiPageFromFile method of the GdPictureImaging object.

  3. Determine the number of pages with the GetPageCount method of the GdPictureImaging object.

  4. Configure the OCR process with the GdPictureOCR object in the following way:

    • Set the path to the OCR resource folder with the ResourceFolder property. The default language resources are located in GdPicture.NET 14\Redist\OCR. For more information on adding language resources, see the language support guide.

    • With the AddLanguage method, add the language resources that GdPicture.NET uses to recognize text in the image. This method takes a member of the OCRLanguage enumeration.

    • Optional: Set whether OCR prioritizes recognition accuracy or speed with the OCRMode property.

    • Optional: Set the character allowlist with the CharacterSet property. When scanning the image, the OCR engine only recognizes the characters included in the allowlist.

    • Optional: Set the character denylist with the CharacterBlackList property. When scanning the image, the OCR engine doesn’t recognize the characters included in the denylist.

  5. Create the resulting PDF document with the NewPDF method of the GdPicturePDF object. The parameter of this method sets the conformance level of the PDF document. This parameter is a member of the PdfConformance enumeration. For example, use PDF to create a common PDF document.

  6. Loop through pages of the image file.

  7. For each page, run the OCR process with the RunOCR method of the GdPictureOCR object, and then add the result to a new page in the PDF.

  8. Save the output in a PDF document.

The example below converts a multipage TIFF file to a searchable PDF:

using GdPicturePDF gdpicturePDF = new GdPicturePDF();
using GdPictureImaging gdpictureImaging = new GdPictureImaging();
using GdPictureOCR gdpictureOCR = new GdPictureOCR();
// Select the image to process.
int imageID = gdpictureImaging.TiffCreateMultiPageFromFile(@"C:\temp\source.tif");
// Determine the number of pages.
int pageCount = gdpictureImaging.GetPageCount(imageID);
// Set the OCR parameters.
gdpictureOCR.ResourceFolder = @"C:\GdPicture.NET 14\Redist\OCR";
gdpictureOCR.AddLanguage(OCRLanguage.English);
// Create the resulting PDF document.
gdpicturePDF.NewPDF(PdfConformance.PDF);
gdpicturePDF.SetOrigin(PdfOrigin.PdfOriginTopLeft);
string fontResName = gdpicturePDF.AddStandardFont(PdfStandardFont.PdfStandardFontCourier);
// Loop through pages of the image file.
string resID = "page";
string content = null;
for (int i = 1; i <= pageCount; i++)
{
    // Select the current page and set up the image for OCR.
    gdpictureImaging.TiffSelectPage(imageID, i);
    gdpictureOCR.SetImage(imageID);
    // Run the OCR process on the current page.
    gdpictureOCR.RunOCR(resID);
    // Get the result.
    content = gdpictureOCR.GetOCRResultText(resID);
    // Add the result to a new page in the PDF.
    gdpicturePDF.NewPage(PdfPageSizes.PdfPageSizeA4);
    gdpicturePDF.DrawText(fontResName, 10, 10, content);
    // Release the previous OCR result. This improves memory management.
    gdpictureOCR.ReleaseOCRResult(resID);
}
// Save the resulting PDF document.
gdpicturePDF.SaveToFile(@"C:\temp\output.pdf");
gdpicturePDF.CloseDocument();
gdpictureImaging.ReleaseGdPictureImage(imageID);
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()
Using gdpictureImaging As GdPictureImaging = New GdPictureImaging()
Using gdpictureOCR As GdPictureOCR = New GdPictureOCR()
    ' Select the image to process.
    Dim imageID As Integer = gdpictureImaging.TiffCreateMultiPageFromFile("C:\temp\source.tif")
    ' Determine the number of pages.
    Dim pageCount = 0
    pageCount = gdpictureImaging.GetPageCount(imageID)
    ' Set the OCR parameters.
    gdpictureOCR.ResourceFolder = "C:\GdPicture.NET 14\Redist\OCR"
    gdpictureOCR.AddLanguage(OCRLanguage.English)
    ' Create the resulting PDF document.
    gdpicturePDF.NewPDF(PdfConformance.PDF)
    gdpicturePDF.SetOrigin(PdfOrigin.PdfOriginTopLeft)
    Dim fontResName As String = gdpicturePDF.AddStandardFont(PdfStandardFont.PdfStandardFontCourier)
    ' Loop through pages of the image file.
    Dim resID = "page"
    Dim content As String = Nothing
    For i = 1 To pageCount
        ' Select the current page and set up the image for OCR.
        gdpictureImaging.TiffSelectPage(imageID, i)
        gdpictureOCR.SetImage(imageID)
        ' Run the OCR process on the current page.
        gdpictureOCR.RunOCR(resID)
        ' Get the result.
        content = gdpictureOCR.GetOCRResultText(resID)
        ' Add the result to a new page in the PDF.
        gdpicturePDF.NewPage(PdfPageSizes.PdfPageSizeA4)
        gdpicturePDF.DrawText(fontResName, 10, 10, content)
        ' Release the previous OCR result. This improves memory management.
        gdpictureOCR.ReleaseOCRResult(resID)
    Next
    ' Save the resulting PDF document.
    gdpicturePDF.SaveToFile("C:\temp\output.pdf")
    gdpicturePDF.CloseDocument()
    gdpictureImaging.ReleaseGdPictureImage(imageID)
End Using
End Using
End Using
Used Methods and Properties

Related Topics