Package com.itextpdf.text.pdf.parser
Class SimpleTextExtractionStrategy
- java.lang.Object
-
- com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy
-
- All Implemented Interfaces:
RenderListener
,TextExtractionStrategy
public class SimpleTextExtractionStrategy extends java.lang.Object implements TextExtractionStrategy
A simple text extraction renderer. This renderer keeps track of the current Y position of each string. If it detects that the y position has changed, it inserts a line break into the output. If the PDF renders text in a non-top-to-bottom fashion, this will result in the text not being a true representation of how it appears in the PDF. This renderer also uses a simple strategy based on the font metrics to determine if a blank space should be inserted into the output.- Since:
- 2.1.5
-
-
Constructor Summary
Constructors Constructor Description SimpleTextExtractionStrategy()
Creates a new text extraction renderer.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
appendTextChunk(java.lang.CharSequence text)
Used to actually append text to the text results.void
beginTextBlock()
Called when a new text block is beginning (i.e.void
endTextBlock()
Called when a text block has ended (i.e.java.lang.String
getResultantText()
Returns the result so far.void
renderImage(ImageRenderInfo renderInfo)
no-op method - this renderer isn't interested in image eventsvoid
renderText(TextRenderInfo renderInfo)
Captures text using a simplified algorithm for inserting hard returns and spaces
-
-
-
Method Detail
-
beginTextBlock
public void beginTextBlock()
Description copied from interface:RenderListener
Called when a new text block is beginning (i.e. BT)- Specified by:
beginTextBlock
in interfaceRenderListener
- Since:
- 5.0.1
-
endTextBlock
public void endTextBlock()
Description copied from interface:RenderListener
Called when a text block has ended (i.e. ET)- Specified by:
endTextBlock
in interfaceRenderListener
- Since:
- 5.0.1
-
getResultantText
public java.lang.String getResultantText()
Returns the result so far.- Specified by:
getResultantText
in interfaceTextExtractionStrategy
- Returns:
- a String with the resulting text.
-
appendTextChunk
protected final void appendTextChunk(java.lang.CharSequence text)
Used to actually append text to the text results. Subclasses can use this to insert text that wouldn't normally be included in text parsing (e.g. result of OCR performed against image content)- Parameters:
text
- the text to append to the text results accumulated so far
-
renderText
public void renderText(TextRenderInfo renderInfo)
Captures text using a simplified algorithm for inserting hard returns and spaces- Specified by:
renderText
in interfaceRenderListener
- Parameters:
renderInfo
- render info
-
renderImage
public void renderImage(ImageRenderInfo renderInfo)
no-op method - this renderer isn't interested in image events- Specified by:
renderImage
in interfaceRenderListener
- Parameters:
renderInfo
- information specifying what to render- Since:
- 5.0.1
- See Also:
RenderListener.renderImage(com.itextpdf.text.pdf.parser.ImageRenderInfo)
-
-