Interface DocumentTextProvider
-
- All Implemented Interfaces:
public interface DocumentTextProvider
Interface for providing text data of a PDF document.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public final class
DocumentTextProvider.TextRange
-
Method Summary
Modifier and Type Method Description abstract String
getPageText(@IntRange(from = 0) Integer pageIndex)
Returns text content of the document page. abstract String
getPageText(@IntRange(from = 0) Integer pageIndex, Integer start, Integer length)
Returns text content between two character indexes. abstract String
getPageText(@IntRange(from = 0) Integer pageIndex, RectF rect)
Returns text content inside given page rectangle. String
getPageText(DocumentTextProvider.TextRange textRange)
abstract Integer
getPageTextLength(@IntRange(from = 0) Integer pageIndex)
Gets number of characters in text on the page. abstract List<Rect>
getPageTextRects(@IntRange(from = 0) Integer pageIndex, Integer startIndex, Integer length, Boolean markupPadding)
Returns the rects of the range of characters on a page. abstract Integer
getCharIndexAt(@IntRange(from = 0) Integer pageIndex, Float x, Float y)
Return the index of the closest character whose rect intersects the given x
andy
coordinates.abstract DocumentTextProvider.TextRange
getWordAtPoint(Integer pageIndex, Float x, Float y, Float tolerance, Boolean markupPadding)
Looks for a word at the given point. -
-
Method Detail
-
getPageText
abstract String getPageText(@IntRange(from = 0) Integer pageIndex)
Returns text content of the document page.
- Parameters:
pageIndex
- 0-indexed page number.- Returns:
Text on the page. Text lines end with CRLF (
\r\n
).
-
getPageText
abstract String getPageText(@IntRange(from = 0) Integer pageIndex, Integer start, Integer length)
Returns text content between two character indexes. Use getPageTextLength to determine the number of characters on page.
- Parameters:
pageIndex
- 0-indexed page number.start
- Index of first character in the range.length
- Length of the range.- Returns:
Text on the page between passed ranges. Text lines end with CRLF (
\r\n
).
-
getPageText
abstract String getPageText(@IntRange(from = 0) Integer pageIndex, RectF rect)
Returns text content inside given page rectangle.
- Parameters:
pageIndex
- 0-indexed page number.rect
- Page rectangle in the PDF coordinates.- Returns:
Text on the page inside given page rect. Text lines end with CRLF (
\r\n
).
-
getPageText
String getPageText(DocumentTextProvider.TextRange textRange)
-
getPageTextLength
abstract Integer getPageTextLength(@IntRange(from = 0) Integer pageIndex)
Gets number of characters in text on the page.
- Parameters:
pageIndex
- 0-indexed page number.- Returns:
Number of characters in page text.
-
getPageTextRects
abstract List<Rect> getPageTextRects(@IntRange(from = 0) Integer pageIndex, Integer startIndex, Integer length, Boolean markupPadding)
Returns the rects of the range of characters on a page.
- Parameters:
pageIndex
- Page number of the page, zero indexed.startIndex
- Index of the starting character.length
- Number of characters in sequence.markupPadding
- Take the font height into account in the rects to make better suited for being displayed around the text.- Returns:
List of rects of the characters on page in PDF point units. May be an empty list if the character is not represented visually.
-
getCharIndexAt
abstract Integer getCharIndexAt(@IntRange(from = 0) Integer pageIndex, Float x, Float y)
Return the index of the closest character whose rect intersects the given
x
andy
coordinates.- Parameters:
pageIndex
- Page number of the page, zero indexed.x
- X coordinate in PDF point unitsy
- Y coordinate in PDF point units- Returns:
The index of the selected character (zero indexed) or
-1
if no character was found at the given coordinates.
-
getWordAtPoint
abstract DocumentTextProvider.TextRange getWordAtPoint(Integer pageIndex, Float x, Float y, Float tolerance, Boolean markupPadding)
Looks for a word at the given point.
- Parameters:
pageIndex
- Page number of the page, zero indexed.x
- X coordinate in PDF point unitsy
- Y coordinate in PDF point unitsmarkupPadding
- Take the font height into account in the rects to make better suited for being displayed around the text.- Returns:
The TextRange of the word if one was found, or
null
if otherwise.
-
-
-
-