Create and Embed Sound Annotations in PDFs on Android

PSPDFKit supports the reading and creation of sound annotations with embedded audio data. There’s also full UI support for playing back and recording new sound annotations.

Creating Sound Annotations

SoundAnnotations can be created programmatically in the same way as other supported annotations can. The audio data that should be set on an annotation needs to be wrapped in an EmbeddedAudioSource. EmbeddedAudioSource expects the raw audio data in PCM format served by either a DataProvider or a byte array. You’ll also need to provide details about the supplied audio data: sample rate, sample size, and the number of channels. Note that the audio data is expected to be in a format defined by the PDF specification, i.e. big-endian with interleaved channels.

The easiest way to create instances of EmbeddedAudioSource is by using AudioExtractor, which extracts audio tracks from media files and automatically converts the extracted audio data into the format that can be used by our model classes.

The following example shows how to create a sound annotation with audio data extracted from a media file in the application’s assets:

// The audio decoder supports decoding audio tracks from all media formats that are supported by `MediaExtractor`.
// Even video file formats are supported.
val audioExtractor = AudioExtractor(this, Uri.parse("file:///android_asset/media-file.mp3"))

// If there is more than one audio track, use this call to select which track should be extracted.
// The first audio track is selected by default.
// audioExtractor.selectAudioTrack(0);

// Extract the audio track asynchronously.
audioExtractor.extractAudioTrackAsync().subscribe {
    embeddedAudioSource->
    // Create a new sound annotation from the extracted audio track.
    val soundAnnotation = SoundAnnotation(
        // Page index 0.
        0,
        // Page rectangle (in PDF coordinates).
        RectF(...),
        // Embedded audio source returned by the audio extractor.
        embeddedAudioSource)

    // Add the created annotation to the page.
    document.getAnnotationProvider().addAnnotationToPage(soundAnnotation)
}
// The audio decoder supports decoding audio tracks from all media formats that are supported by `MediaExtractor`.
// Even video file formats are supported.
AudioExtractor audioExtractor = new AudioExtractor(this, Uri.parse("file:///android_asset/media-file.mp3"));

// If there is more than one audio track, use this call to select which track should be extracted.
// The first audio track is selected by default.
// audioExtractor.selectAudioTrack(0);

// Extract the audio track asynchronously.
audioExtractor.extractAudioTrackAsync().subscribe(embeddedAudioSource -> {
    // Create a new sound annotation from the extracted audio track.
    SoundAnnotation soundAnnotation = new SoundAnnotation(
        // Page index 0.
        0,
        // Page rectangle (in PDF coordinates).
        new RectF(...),
        // Embedded audio source returned by the audio extractor.
        embeddedAudioSource);

    // Add the created annotation to the page.
    document.getAnnotationProvider().addAnnotationToPage(soundAnnotation);
});

For a full example, take a look at AnnotationCreationExample inside the Catalog app.

Supported Audio Formats

The PDF specification defines multiple audio encoding formats that can be used for embedded audio data. PSPDFKit for Android supports playback of the most commonly used encoding, AudioEncoding#SIGNED. The supported sample rates and sample sizes depend on the target device. Android guarantees that the 16-bit sample size is supported on all devices (see AudioFormat#ENCODING_PCM_16BIT). This sample size is used for sound annotation recording and audio extraction (using AudioExtractor) as well.

AudioExtractor uses the MediaExtractor class from the Android SDK. This means that the supported media formats depend on the target device. Most variations of MP3 audio and MP4 video formats should be supported by most devices. However, please test your code on an actual device, as support in emulator environments can be different.