Class TaggedPdfReaderTool

  • Direct Known Subclasses:
    CompareTool.CmpTaggedPdfReaderTool

    public class TaggedPdfReaderTool
    extends java.lang.Object
    Converts a tagged PDF document into an XML file.
    Since:
    5.0.2
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected java.io.PrintWriter out
      The writer object to which the XML will be written
      protected PdfReader reader
      The reader object from which the content streams are read.
    • Field Detail

      • reader

        protected PdfReader reader
        The reader object from which the content streams are read.
      • out

        protected java.io.PrintWriter out
        The writer object to which the XML will be written
    • Constructor Detail

      • TaggedPdfReaderTool

        public TaggedPdfReaderTool()
    • Method Detail

      • convertToXml

        public void convertToXml​(PdfReader reader,
                                 java.io.OutputStream os,
                                 java.lang.String charset)
                          throws java.io.IOException
        Parses a string with structured content.
        Parameters:
        reader - the PdfReader that has access to the PDF file
        os - the OutputStream to which the resulting xml will be written
        charset - the charset to encode the data
        Throws:
        java.io.IOException
        Since:
        5.0.5
      • convertToXml

        public void convertToXml​(PdfReader reader,
                                 java.io.OutputStream os)
                          throws java.io.IOException
        Parses a string with structured content. The output is done using the current charset.
        Parameters:
        reader - the PdfReader that has access to the PDF file
        os - the OutputStream to which the resulting xml will be written
        Throws:
        java.io.IOException
      • inspectChild

        public void inspectChild​(PdfObject k)
                          throws java.io.IOException
        Inspects a child of a structured element. This can be an array or a dictionary.
        Parameters:
        k - the child to inspect
        Throws:
        java.io.IOException
      • inspectChildArray

        public void inspectChildArray​(PdfArray k)
                               throws java.io.IOException
        If the child of a structured element is an array, we need to loop over the elements.
        Parameters:
        k - the child array to inspect
        Throws:
        java.io.IOException
      • inspectChildDictionary

        public void inspectChildDictionary​(PdfDictionary k)
                                    throws java.io.IOException
        If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.
        Parameters:
        k - the child dictionary to inspect
        Throws:
        java.io.IOException
      • inspectChildDictionary

        public void inspectChildDictionary​(PdfDictionary k,
                                           boolean inspectAttributes)
                                    throws java.io.IOException
        If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.
        Parameters:
        k - the child dictionary to inspect
        Throws:
        java.io.IOException
      • xmlName

        protected java.lang.String xmlName​(PdfName name)
      • fixTagName

        private static java.lang.String fixTagName​(java.lang.String tag)
      • parseTag

        public void parseTag​(java.lang.String tag,
                             PdfObject object,
                             PdfDictionary page)
                      throws java.io.IOException
        Searches for a tag in a page.
        Parameters:
        tag - the name of the tag
        object - an identifier to find the marked content
        page - a page dictionary
        Throws:
        java.io.IOException