Extract Metadata from PDFs on Android

PSPDFKit comes with DocumentPdfMetadata and DocumentXmpMetadata, which allow you to retrieve or modify a document’s metadata. This guide covers extracting metadata (to modify metadata, please see our separate guide for editing metadata).

Dictionary-Based Metadata

Use DocumentPdfMetadata to work with the dictionary-based metadata in a PDF.

All values specified in the PdfValue are represented by the following types:

  • Boolean

  • long

  • double

  • String

  • List<PdfValue>

  • Map<String, PdfValue>

By default, the dictionary metadata may contain the following information keys:

  • Author

  • CreationDate

  • Creator

  • Keywords

  • ModDate

  • Producer

  • Title

You can, of course, add any supported key-value dictionary to the metadata. When dealing with these predefined keys, it’s recommended to use the DocumentPdfMetadata getters and setters so that you get out-of-the-box conversions from objects such as Date.

To get an entry of the metadata dictionary (e.g. the Author), you can use the following code snippet:

val document = ...
val pdfMetadata = document.getPdfMetadata()
val author = pdfMetada.getAuthor()
PdfDocument document = ...
DocumentPdfMetadata pdfMetadata = document.getPdfMetadata();
String author = pdfMetadata.getAuthor();

For any custom values, use this:

val document = ...
val pdfMetadata = document.pdfMetadata
val value = pdfMetada.get("Custom key")
PdfDocument document = ...
DocumentPdfMetadata pdfMetadata = document.getPdfMetadata();
PdfValue value = pdfMetadata.get("Custom key");

XMP Metadata

Use DocumentXmpMetadata to work with the metadata stream containing XMP data.

Each key in the XMP metadata stream has to have a namespace set. You can define your own namespace or use one of the already existing ones. PSPDFKit exposes two constants for common namespaces:

When setting a value, you also have to pass along a suggested namespace prefix, as this can’t be generated automatically.

Use the following code snippet to get an object from the XMP metadata:

val xmpMetadata = document.xmpMetadata
val pdfValue = xmpMetadata.get("Key", NAMESPACE)
DocumentXmpMetadata xmpMetadata = document.getXmpMetadata();
PdfValue pdfValue = xmpMetadata.get("Key", NAMESPACE);