An image is worth a thousand words, but what happens if you have custom data that doesn’t fit into an image model? Can that data be part of an image file? It sure can! And in this blog post, I’ll introduce the common data you find in images and then show how you can embed custom data for your own needs, just like PSPDFKit does with Image Documents!
It’s most likely obvious to many that image files such as PNGs, JPEGs, TIFFs, and WebM all hold image data. Often, it’s compressed in a strategic algorithm to minimize the binary size while still representing an image in the most accurate way.
And that’s as much as I want to talk about image data and compression, because we’re interested in the metadata that’s also held in the image file.
Metadata can be data such as the image size, date created, camera setup, location information, and much more. It’s become common practice for cameras to add such information to the image file saved, and this metadata can prove useful and interesting from a product creation point of view.
For example, isn’t it cool (and kind of weird) that Google can place all the photos you’ve taken on a map with precise coordinates and then tell you when each one was taken?!
Most of this metadata is standardized in a common format named Exchangeable image file format (Exif). In this way, any reader with structural knowledge of Exif can read the information. Originally found in TIFF files, the Exif structure can be retrofitted to many other image formats. For more information on the Exif standard and what it can contain, refer to the Wikipedia page on the topic.
That’s all well and good, but what about other data types? The following section will talk about XMP.
As mentioned above, the Exif standard supports a fixed set of types surrounding the use of images in the real world. But that’s not to say it’s the only data you’d want to store in an image.
If the data you want to save is a type that’s outside the Exif standard — for example, business logic data or non-image related data — then you have to look to a different standard, like XMP.
Where XMP differs from Exif is its ability to be self-descriptive. This means it’s a standard that can hold any type of information, and even if the reader is unaware of the type of information, it can still retrieve the information in spite of having no understanding of what the data represents.
XMP is a standard described in XML, meaning that there are many off-the-shelf library implementations in a multitude of languages that can interact with XMP. Obviously, there are more specialized libraries that only deal with XMP, but the fundamentals are covered in nearly all programming languages.
It’s also worth noting that XMP isn’t limited to image files. Any file type could technically implement metadata in XMP, which could be useful when syncing common data between different file types.
Now that we understand that XMP is a good choice for storing custom metadata in an image, we need to look at implementations.
For this example, we’ll look at a C++ implementation, because Adobe hosts an XMP SDK on GitHub, and it’s freely available for use (BSD-3 license). But note that there are other implementations in different languages if needed (when clicking on the previous link, you’ll have to filter out all XMPP messaging standard libraries).
The following examples are partially complete for the sake of brevity, but full examples can be found in the GitHub repository.
First, we can read the XMP for known values. The following will print the value held in
CreatorTool, if it exists:
SXMPMeta meta; xmpFile.GetXMP(&meta); std::string simpleValue; const auto exists = meta.GetProperty(kXMP_NS_XMP, "CreatorTool", &simpleValue, NULL); if(exists) std::cout << "CreatorTool = " << simpleValue << std::endl;
In the example above, the first parameter of the
GetProperty function is called with the value
kXMP_NS_XMP. This is a constant that points to the XML namespace for the “basic” XMP schema. There are many standard namespaces in XMP that hold information specific to certain use cases. All of these can be found in the
XMP_Const.h header file.
With that knowledge, we can see that it’s necessary to add our own, new namespace to avoid naming clashes. The following code will register our new namespace and return the prefix to use to avoid namespace clashes in the XML:
const auto myNamespace = "http://mynamespace.com/test/xmp/1.0/"; const auto myPrefix = "mn"; std::string actualKeyPrefix; SXMPMeta::RegisterNamespace(myNamespace, myPrefix, &actualKeyPrefix);
actualKeyPrefix is the actual prefix used, because the requested prefix,
mn, may already be taken in the already existing namespace in the metadata.
The next step would be to write something to the metadata with this namespace:
SXMPMeta meta; xmpFile.GetXMP(&meta); metadata.SetProperty(myNamespace, “my_custom_data”, customData); xmpFile.PutXMP(xmpMeta);
Now the metadata will contain our new data —
customData — which can be one of many value types, such as a string, number, Boolean, etc. However, in our case, we have binary data we want to save. A simple way to store binary data is to Base64 encode it and save it as a string.
That’s it; we’ve stored custom data. To read the data back out, we can use the exact same code from the first example.
Now that we’ve seen how custom data can be saved, let’s look at where XMP and custom data can be useful.
A few years back, we released a component called Image Documents. This allows users to annotate images, just like they would PDFs, directly in their image files. More importantly, the annotations are applied in a non-destructive way, meaning that annotations live in the metadata of an image, and they can be opened, edited, saved, and then edited again at a later date if required.
Image Documents was created to work with PSPDFKit. However, because of the power of XMP, standard image viewers can still open image document files unhindered. It’s obviously not possible for those image viewers to edit the annotations unless they follow our specification, but viewers are absolutely unhindered by our additions — meaning they can still view the underlying image and annotations, just without the editing ability.
You can read more and view working examples of Image Documents on the dedicated page.
I hope this blog post has inspired you to dig further into images and custom metadata within images. The image file possibilities are endless, and exploring custom data in image files may even open up a path to a new feature, just like it did with us here at PSPDFKit.