Extract Images from PDFs in C#

This guide explains how to extract images from PDF documents using C#. Images can be added to a PDF document in the following ways:

  • Embedded in the internal structure of the PDF document.

  • Added to the PDF document as an image annotation.

GdPicture.NET currently enables you to extract images embedded in a PDF document. Extracting images from image annotations isn’t supported.

To extract images embedded in a PDF document, follow these steps:

  1. Create a GdPicturePDF object and a GdPictureImaging object.

  2. Select the source document by passing its path to the LoadFromFile method of the GdPicturePDF object.

  3. Determine the number of pages with the GetPageCount method of the GdPicturePDF object and loop through them.

  4. Determine the number of images on the page with the GetPageImageCount method of the GdPicturePDF object and loop through them.

  5. Extract the image by passing the index of the image to the ExtractPageImage method of the GdPicturePDF object.

  6. Save the output in a new image file with the SaveAsPNG method of the GdPictureImaging object.

  7. Release unnecessary resources.

The example below extracts all embedded images from a PDF document:

using GdPicturePDF gdpicturePDF = new GdPicturePDF();
using GdPictureImaging gdpictureImaging = new GdPictureImaging();
// Select the source document.
gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf");
// Determine the number of pages and loop through them.
int pageCount = gdpicturePDF.GetPageCount();
for (int page = 1; page <= pageCount; page++)
{
    gdpicturePDF.SelectPage(page);
    // Determine the number of images on the page and loop through them.
    int imageCount = gdpicturePDF.GetPageImageCount();
    for (int imageIndex = 0; imageIndex < imageCount; imageIndex++)
    {
        // Extract the image.
        int imageId = gdpicturePDF.ExtractPageImage(imageIndex);
        // Save the output in a new image file.
        gdpictureImaging.SaveAsPNG(imageId, @"C:\temp\page-" + page + "-image-" + imageIndex + ".png");
        // Release unnecessary resources.
        gdpictureImaging.ReleaseGdPictureImage(imageId);
    }
}
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()
Using gdpictureImaging As GdPictureImaging = New GdPictureImaging()
    ' Select the source document.
    gdpicturePDF.LoadFromFile("C:\temp\source.pdf")
    ' Determine the number of pages and loop through them.
    Dim pageCount As Integer = gdpicturePDF.GetPageCount()
    For page = 1 To pageCount
        gdpicturePDF.SelectPage(page)
        ' Determine the number of images on the page and loop through them.
        Dim imageCount As Integer = gdpicturePDF.GetPageImageCount()
        For imageIndex = 0 To imageCount - 1
            ' Extract the image.
            Dim imageId As Integer = gdpicturePDF.ExtractPageImage(imageIndex)
            ' Save the output in a new image file.
            gdpictureImaging.SaveAsPNG(imageId, "C:\temp\page-" & page & "-image-" & imageIndex & ".png")
            ' Release unnecessary resources.
            gdpictureImaging.ReleaseGdPictureImage(imageId)
        Next
    Next
End Using
End Using
Used Methods

Related Topics