Extract Data from PDF Form Fields on Android

The data that users fill in PDF form fields can be extracted programmatically or serialized to the XFDF format or the Instant JSON format.

Extracting Form Data Programmatically

To extract data in a custom format, you can read the values from the form fields, which are accessible using a document’s form provider:

// Get a reference to the form fields in the document.
val checkbox = document.formProvider.getFormFieldWithFullyQualifiedName("CheckBox 1") as CheckBoxFormField
// Extract the value.
val selected = checkbox.formElement.isSelected
// Get a reference to the form fields in the document.
final CheckBoxFormField checkbox = (CheckBoxFormField) document.getFormProvider().getFormFieldWithFullyQualifiedName("CheckBox 1");
// Extract the value.
final boolean selected = checkbox.getFormElement().isSelected();

Extracting Form Data as XFDF

You can export the values of form fields from a document as an XFDF file using XfdfFormatter, like this:

// Create a temporary file for the XFDF output.
val xfdfOutputTempFile = File.createTempFile("tmp-", "ExtractedFormFields.xfdf")
xfdfOutputTempFile.deleteOnExit()

// Write data into it.
XfdfFormatter.writeXfdf(document, emptyList(), document.formProvider.formFields, FileOutputStream(xfdfOutputTempFile))
// Create a temporary file for the XFDF output.
final File xfdfOutputTempFile = File.createTempFile("tmp-", "ExtractedFormFields.xfdf");
xfdfOutputTempFile.deleteOnExit();

// Write data into it.
XfdfFormatter.writeXfdf(document, emptyList(), document.getFormProvider().getFormFields(), new FileOutputStream(xfdfOutputTempFile));

Extracting Form Data as Instant JSON

Instant JSON is optimized for annotations. However, generating Instant JSON from a document will also include form field values.

Since exporting Instant JSON from a document produces a diff between the in-memory document and the document saved on disk, you should ensure auto-save is disabled and that you’re not saving the document at any other time. Disable auto-save by configuring a PdfActivityConfiguration.Builder, like this:

configuration.autosaveEnabled(false).build()
configuration.autosaveEnabled(false).build();

You can export Instant JSON as a file using DocumentJsonFormatter#exportDocumentJson(PdfDocument, OutputStream), like this:

val outputStream = ByteArrayOutputStream()
DocumentJsonFormatter.exportDocumentJson(document, outputStream)
final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
DocumentJsonFormatter.exportDocumentJson(document, outputStream);

Note that this Instant JSON will also include the visual properties of the form element, such as text color and border style, as well as other annotations, such as text highlights and drawings.

Exporting Instant JSON from an Annotation using the toInstantJson() function isn’t appropriate for form field values because this models the properties of the form element (text color, border style, etc.) and doesn’t include the value of a form field associated with a form element.