Enhance Characters in PDFs and Images in C#

This guide explains how to enhance characters in PDFs and images.

Thick and Oversampled Characters

Sometimes characters in documents appear thick and their features are unclear — for example, if too much ink was used to print a page, or if a document was scanned and printed many times. A process called erosion can fix this issue by removing pixels on the edges of images.

The images below show what a document looks like before and after erosion.

Before enhancement After enhancement

Information

Don’t preprocess documents before recognizing text with OCR. The GdPicture.NET OCR engine preprocesses documents automatically with better results than manual preprocessing.

To fix thick characters, follow these steps:

  1. Create a GdPictureImaging object.

  2. Select the image by passing its path to the CreateGdPictureImageFromFile method of the GdPictureImaging object.

  3. Fix thick characters by passing the image ID to the FxBitonalErode8 method of the GdPictureImaging object.

  4. Save the output in a new image with the SaveAsPNG method of the GdPictureImaging object.

  5. Release the image resource with the ReleaseGdPictureImage method of the GdPictureImaging object.

The example below fixes thick characters:

using GdPictureImaging gdpictureImaging = new GdPictureImaging();
// Load the image from a file.
int imageId = gdpictureImaging.CreateGdPictureImageFromFile(@"C:/temp/source.png");
// Fix thick characters.
gdpictureImaging.FxBitonalErode8(imageId);
// Save the output in a new image.
gdpictureImaging.SaveAsPNG(imageId, @"C:/temp/output.png");
gdpictureImaging.ReleaseGdPictureImage(imageId);
Using gdpictureImaging As GdPictureImaging = New GdPictureImaging()
    ' Load the image from a file.
    Dim imageId As Integer = gdpictureImaging.CreateGdPictureImageFromFile("C:/temp/source.png")
    ' Fix thick characters.
    gdpictureImaging.FxBitonalErode8(imageId)
    ' Save the output in a new image.
    gdpictureImaging.SaveAsPNG(imageId, "C:/temp/output.png")
    gdpictureImaging.ReleaseGdPictureImage(imageId)
End Using
Used Methods and Properties

Related Topics

Faint and Low-Sampled Characters

Sometimes characters in documents appear faint and low-sampled, and their features are unclear — for example, if the brightness used to scan a page was too high, or a document was converted to binary images with a bad algorithm. A process called black dilation can fix this issue by adding black pixels around objects.

To fix faint characters, follow these steps:

  1. Create a GdPictureImaging object.

  2. Select the image by passing its path to the CreateGdPictureImageFromFile method of the GdPictureImaging object.

  3. Fix faint characters by passing the image ID to the FxBitonalDilate8 method of the GdPictureImaging object.

  4. Save the output in a new image with the SaveAsPNG method of the GdPictureImaging object.

  5. Release the image resource with the ReleaseGdPictureImage method of the GdPictureImaging object.

The example below fixes faint characters:

using GdPictureImaging gdpictureImaging = new GdPictureImaging();
// Load the image from a file.
int imageId = gdpictureImaging.CreateGdPictureImageFromFile(@"C:/temp/source.png");
// Fix faint characters.
gdpictureImaging.FxBitonalDilate8(imageId);
// Save the output in a new image.
gdpictureImaging.SaveAsPNG(imageId, @"C:/temp/output.png");
gdpictureImaging.ReleaseGdPictureImage(imageId);
Using gdpictureImaging As GdPictureImaging = New GdPictureImaging()
    ' Load the image from a file.
    Dim imageId As Integer = gdpictureImaging.CreateGdPictureImageFromFile("C:/temp/source.png")
    ' Fix faint characters.
    gdpictureImaging.FxBitonalDilate8(imageId)
    ' Save the output in a new image.
    gdpictureImaging.SaveAsPNG(imageId, "C:/temp/output.png")
    gdpictureImaging.ReleaseGdPictureImage(imageId)
End Using
Used Methods

Related Topics