Interface DocumentTextProvider

  • All Implemented Interfaces:

    
    public interface DocumentTextProvider
    
                        

    Interface for providing text data of a PDF document.

    • Constructor Detail

    • Method Detail

      • getPageText

         abstract String getPageText(@IntRange(from = 0) Integer pageIndex)

        Returns text content of the document page.

        Parameters:
        pageIndex - 0-indexed page number.
        Returns:

        Text on the page. Text lines end with CRLF (\r\n).

      • getPageText

         abstract String getPageText(@IntRange(from = 0) Integer pageIndex, Integer start, Integer length)

        Returns text content between two character indexes. Use getPageTextLength to determine the number of characters on page.

        Parameters:
        pageIndex - 0-indexed page number.
        start - Index of first character in the range.
        length - Length of the range.
        Returns:

        Text on the page between passed ranges. Text lines end with CRLF (\r\n).

      • getPageText

         abstract String getPageText(@IntRange(from = 0) Integer pageIndex, RectF rect)

        Returns text content inside given page rectangle.

        Parameters:
        pageIndex - 0-indexed page number.
        rect - Page rectangle in the PDF coordinates.
        Returns:

        Text on the page inside given page rect. Text lines end with CRLF (\r\n).

      • getPageTextLength

         abstract Integer getPageTextLength(@IntRange(from = 0) Integer pageIndex)

        Gets number of characters in text on the page.

        Parameters:
        pageIndex - 0-indexed page number.
        Returns:

        Number of characters in page text.

      • getPageTextRects

         abstract List<Rect> getPageTextRects(@IntRange(from = 0) Integer pageIndex, Integer startIndex, Integer length, Boolean markupPadding)

        Returns the rects of the range of characters on a page.

        Parameters:
        pageIndex - Page number of the page, zero indexed.
        startIndex - Index of the starting character.
        length - Number of characters in sequence.
        markupPadding - Take the font height into account in the rects to make better suited for being displayed around the text.
        Returns:

        List of rects of the characters on page in PDF point units. May be an empty list if the character is not represented visually.

      • getCharIndexAt

         abstract Integer getCharIndexAt(@IntRange(from = 0) Integer pageIndex, Float x, Float y)

        Return the index of the closest character whose rect intersects the given x and y coordinates.

        Parameters:
        pageIndex - Page number of the page, zero indexed.
        x - X coordinate in PDF point units
        y - Y coordinate in PDF point units
        Returns:

        The index of the selected character (zero indexed) or -1 if no character was found at the given coordinates.

      • getWordAtPoint

         abstract DocumentTextProvider.TextRange getWordAtPoint(Integer pageIndex, Float x, Float y, Float tolerance, Boolean markupPadding)

        Looks for a word at the given point.

        Parameters:
        pageIndex - Page number of the page, zero indexed.
        x - X coordinate in PDF point units
        y - Y coordinate in PDF point units
        markupPadding - Take the font height into account in the rects to make better suited for being displayed around the text.
        Returns:

        The TextRange of the word if one was found, or null if otherwise.