PSPDFTextParser


@interface PSPDFTextParser : NSObject

Parses text and glyph data of a PDF page.

Note

Do not instantiate this class directly. Instead, use -[PSPDFDocument textParserForPageAtIndex:]. Properties are evaluated lazily and cached.
  • Unavailable

    Undocumented

    Declaration

    Objective-C

    PSPDF_EMPTY_INIT_UNAVAILABLE
  • Unavailable

    Undocumented

    Declaration

    Objective-C

    PSPDF_EMPTY_INIT_UNAVAILABLE
  • The complete page text, in reading order, including extrapolated spaces and newline characters.

    Declaration

    Objective-C

    @property (readonly, copy, nonatomic) NSString *_Nonnull text;

    Swift

    var text: String { get }
  • Complete list of all glyphs defined in the PDF page. Control characters are excluded. These glyphs are guaranteed to be stored in order of their indexOnPage.

    Declaration

    Objective-C

    @property (readonly, copy, nonatomic) NSArray<PSPDFGlyph *> *_Nonnull glyphs;

    Swift

    var glyphs: [PSPDFGlyph] { get }
  • A list of words on the PDF page. We apply heuristics to the glyphs to detect word boundaries in the text.

    Declaration

    Objective-C

    @property (readonly, copy, nonatomic) NSArray<PSPDFWord *> *_Nonnull words;

    Swift

    var words: [PSPDFWord] { get }
  • A list of text blocks on the PDF page. A text block is typically one line in the PDF. In a multi-column layout, a text block will be one line of a single column.

    Declaration

    Objective-C

    @property (readonly, copy, nonatomic)
        NSArray<PSPDFTextBlock *> *_Nonnull textBlocks;

    Swift

    var textBlocks: [PSPDFTextBlock] { get }
  • A list of PSPDFImageInfo objects representing all the images on the PDF page.

    Declaration

    Objective-C

    @property (readonly, copy, nonatomic) NSArray<PSPDFImageInfo *> *_Nonnull images;

    Swift

    var images: [PSPDFImageInfo] { get }
  • The receiver’s associated document provider.

    Declaration

    Objective-C

    @property (readonly, nonatomic)
        PSPDFDocumentProvider *_Nullable documentProvider;

    Swift

    weak var documentProvider: PSPDFDocumentProvider? { get }
  • This is the page that the receiver represents. Relative to the documentProvider it was retrieved from.

    Declaration

    Objective-C

    @property (readonly, nonatomic) PSPDFPageIndex pageIndex;

    Swift

    var pageIndex: PageIndex { get }
  • Returns the page text for the glyphs passed in.

    Note

    This method will find the glyphs with the lowest and highest PSPDFGlyph.indexOnPage, and return all the text between those two indexes. If you want to get the the text for discrete ranges of glyphs, use PSPDFStringFromGlyphs instead.

    Declaration

    Objective-C

    - (nonnull NSString *)textWithGlyphs:(nonnull NSArray<PSPDFGlyph *> *)glyphs;

    Swift

    func text(with glyphs: [PSPDFGlyph]) -> String

    Parameters

    glyphs

    The glyphs from which the text is to be generated.

    Return Value

    An NSString of the text from the passed in glyphs.

  • Finds the glyphs in a passed in range.

    Note

    If the range is invalid, and empty array is returned.

    Declaration

    Objective-C

    - (nonnull NSArray<PSPDFGlyph *> *)glyphsInRange:(NSRange)range;

    Swift

    func glyphs(in range: NSRange) -> [PSPDFGlyph]

    Parameters

    range

    The range for which glyphs are to be fetched. This range must be based on PSPDFGlyph.indexOnPage.

    Return Value

    The glyphs contained in the requested range.