Compare PDF Text Using JavaScript
Programmatic text comparison allows for the analysis of textual content between different documents. It’s particularly useful for documents that have undergone edits, enabling users to spot changes swiftly.
Comparing documents and text is available when using the Web SDK in standalone operational mode.
Text comparison is possible in PSPDFKit for Web with the corresponding license component (only in Standalone mode). Contact Sales if you’re interested in this functionality.
To perform a text comparison operation, you need to provide two documents and a set of options. The options are used to configure the comparison operation.
Describing Your Documents
The PSPDFKit.DocumentDescriptor
class is used to provide all the necessary details about your documents for comparison:
-
filePath
— Path to the document or anArrayBuffer
. -
password
— Optional password if the document is encrypted. -
pageIndexes
— An array of page indexes, or an array of ranges where an array is[min, max]
. If omitted, all pages will be staged for comparison.
const originalDocument = new PSPDFKit.DocumentDescriptor({ filePath: "document-comparison/static/documentA.pdf", pageIndexes: [0] }); const changedDocument = new PSPDFKit.DocumentDescriptor({ filePath: "document-comparison/static/documentB.pdf", pageIndexes: [0] });
Defining the Comparison Operation
The PSPDFKit.ComparisonOperation
class outlines the comparison type and optional settings:
-
type
— Type of comparison. The default isComparisonOperationType.TEXT
. UsePSPDFKit.ComparisonOperationType
to check for available comparison types. As of now, onlyComparisonOperationType.TEXT
is supported. -
options
— The settings for the operation. Currently onlynumberOfContextWords
, which specifies the number of context words for the comparison, is supported.
const textComparisonOperation = new PSPDFKit.ComparisonOperation( PSPDFKit.ComparisonOperationType.TEXT, { numberOfContextWords: 2 } );
Text Comparison
The final step is to call the instance#compareDocuments
method:
const comparisonResult = await instance.compareDocuments( { originalDocument, changedDocument }, textComparisonOperation ); console.log(comparisonResult);
Understanding the Comparison Result
The comparison provides a PSPDFKit.DocumentComparisonResult
, which outlines:
-
type
— The type of comparison (currently onlyComparisonOperationType.TEXT
is supported). -
hunks
— Hunks of detected text changes.
A hunk groups operations that describe how to transform the original text to the changed text. For instance, if a word is replaced, the hunk will include operations to delete the original word and insert the changed word. The structure of a hunk is:
-
originalRange
— The range the hunk represents on the original page. -
changedRange
— The range the hunk represents on the changed page. -
operations
— The operations the hunk contains.
An operation represents a single insertion, a single deletion, or no change between the original and changed text. It’s composed of:
-
type
— The operation type (“insert”, “delete”, or “equal”). -
text
— The text the operation is based upon. -
originalTextBlocks
— The rectangles the operation relates to in the original document. -
changedTextBlocks
— The rectangles the operation relates to in the changed document.
A text block relates text to a specific region in a document:
-
range
— The range in the document page the text block relates to. -
rects
— The rectangles on the document page the text block refers to.
Example Result
The result will be structured similarly to the following:
[{ "documentComparisonResults": [{ "changedPageIndex": 1, "comparisonResults": [{ "hunks": [{ "changedRange": { "length": 1, "position": 1 }, "operations": [{ "changedTextBlocks": { "range": { "length": 1, "position": 0 }, "rects": [ [ 341.1, 265.2, 0, 0 ] ], }, "originalTextBlocks": { "range": { "length": 1, "position": 1 }, "rects": [ [ 341.1, 265.2, 74.4, 288.0 ] ], }, "text": "1", "type": "delete" }], "originalRange": { "length": 1, "position": 1 } }], "type": "text" }], "originalPageIndex": 0 }] }]
These steps allow you to pinpoint changes between documents with ease, and to build your own custom user interface (UI) to display the results, as demonstrated in this sample project. Refer to our public API documentation to read more technical details about the Text Comparison API and learn how to use it in your implementation.