Convert PDFs to Word, Excel, or PowerPoint

Document Engine includes the ability to convert any supported file type into Word, Excel, or PowerPoint. This technology applies a unique hybrid machine learning approach to detect structural elements in the source documents such as paragraphs, tables, and columns.

Information

The PDF-to-Office API license is required to access PDF-to-Office capabilities.

Converting a File to an Office Document

To convert a file to an Office document, post a request to the /api/build endpoint. In the instructions, specify the type parameter as one of the following:

  • docx — convert to Word

  • xlsx — convert to Excel

  • pptx — convert to PowerPoint

curl -X POST http://localhost:5000/api/build \
  -H "Authorization: Token token=<API token>" \
  -F document=@/path/to/example-document.pdf \
  -F instructions='{
  "parts": [
    {
      "file": "document"
    }
  ],
  "output": {
    "type": "docx"
  }
}' \
  -o result.docx
POST /api/build HTTP/1.1
Content-Type: multipart/form-data; boundary=customboundary
Authorization: Token token=<API token>

--customboundary
Content-Disposition: form-data; name="document"; filename="example-document.pdf"
Content-Type: application/pdf

<PDF data>
--customboundary
Content-Disposition: form-data; name="instructions"
Content-Type: application/json

{
  "parts": [
    {
      "file": "document"
    }
  ],
  "output": {
    "type": "docx"
  }
}
--customboundary--

For more information on the build instructions, refer to the API Reference.

Converting a Document Engine Document to an Office Document

Build API instructions can be also used to process documents managed by Document Engine. To reference existing documents, use the following part:

{
	"document": {
		"id": "<document_id>",
		"layer_name": "<optional_layer_name"
	}
}

For example, to convert a document with the ID my_document to Excel, perform the following request:

curl -X POST http://localhost:5000/api/build \
  -H "Authorization: Token token=<API token>" \
  -F instructions='{
  "parts": [
    {
      "file": {
        "document": {
          "id": "my_document"
        }
      }
    }
  ],
  "output": {
    "type": "xlsx"
  }
}' \
  -o result.xlsx
POST /api/build HTTP/1.1
Content-Type: multipart/form-data; boundary=customboundary
Authorization: Token token=<API token>

--customboundary
Content-Disposition: form-data; name="instructions"
Content-Type: application/json

{
  "parts": [
    {
      "file": {
        "document": {
          "id": "my_document"
        }
      }
    }
  ],
  "output": {
    "type": "xlsx"
  }
}
--customboundary--