Class SimpleXMLParser


  • public final class SimpleXMLParser
    extends java.lang.Object
    A simple XML. This parser is, like the SAX parser, an event based parser, but with much less functionality.

    The parser can:

    • It recognizes the encoding used
    • It recognizes all the elements' start tags and end tags
    • It lists attributes, where attribute values can be enclosed in single or double quotes
    • It recognizes the <[CDATA[ ... ]]> construct
    • It recognizes the standard entities: &amp;, &lt;, &gt;, &quot;, and &apos;, as well as numeric entities
    • It maps lines ending in \r\n and \r to \n on input, in accordance with the XML Specification, Section 2.11

    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static int ATTRIBUTE_EQUAL  
      private static int ATTRIBUTE_KEY  
      private static int ATTRIBUTE_VALUE  
      private java.lang.String attributekey
      the attribute key.
      private java.util.HashMap<java.lang.String,​java.lang.String> attributes
      current attributes
      private java.lang.String attributevalue
      the attribute value.
      private static int CDATA  
      private int character
      The current character.
      private int columns
      the column where the current character occurs
      private SimpleXMLDocHandlerComment comment
      The handler to which we are going to forward comments.
      private static int COMMENT  
      private SimpleXMLDocHandler doc
      The handler to which we are going to forward document content
      private java.lang.StringBuffer entity
      current entity (whatever is encountered between & and ;)
      private static int ENTITY  
      private boolean eol
      was the last character equivalent to a newline?
      private static int EXAMIN_TAG  
      private boolean html
      Are we parsing HTML?
      private static int IN_CLOSETAG  
      private int lines
      the line we are currently reading
      private int nested
      Keeps track of the number of tags that are open.
      private NewLineHandler newLineHandler  
      private boolean nowhite
      A boolean indicating if the next character should be taken into account if it's a space character.
      private static int PI  
      private int previousCharacter
      The previous character.
      private static int QUOTE  
      private int quoteCharacter
      the quote character that was used to open the quote.
      private static int SINGLE_TAG  
      private java.util.Stack<java.lang.Integer> stack
      the state stack
      private int state
      the current state
      private java.lang.String tag
      current tagname
      private static int TAG_ENCOUNTERED  
      private static int TAG_EXAMINED  
      private java.lang.StringBuffer text
      current text (whatever is encountered between tags)
      private static int TEXT  
      private static int UNKNOWN
      possible states
    • Field Detail

      • stack

        private final java.util.Stack<java.lang.Integer> stack
        the state stack
      • character

        private int character
        The current character.
      • previousCharacter

        private int previousCharacter
        The previous character.
      • lines

        private int lines
        the line we are currently reading
      • columns

        private int columns
        the column where the current character occurs
      • eol

        private boolean eol
        was the last character equivalent to a newline?
      • nowhite

        private boolean nowhite
        A boolean indicating if the next character should be taken into account if it's a space character. When nospace is false, the previous character wasn't whitespace.
        Since:
        2.1.5
      • state

        private int state
        the current state
      • html

        private final boolean html
        Are we parsing HTML?
      • text

        private final java.lang.StringBuffer text
        current text (whatever is encountered between tags)
      • entity

        private final java.lang.StringBuffer entity
        current entity (whatever is encountered between & and ;)
      • tag

        private java.lang.String tag
        current tagname
      • attributes

        private java.util.HashMap<java.lang.String,​java.lang.String> attributes
        current attributes
      • doc

        private final SimpleXMLDocHandler doc
        The handler to which we are going to forward document content
      • nested

        private int nested
        Keeps track of the number of tags that are open.
      • quoteCharacter

        private int quoteCharacter
        the quote character that was used to open the quote.
      • attributekey

        private java.lang.String attributekey
        the attribute key.
      • attributevalue

        private java.lang.String attributevalue
        the attribute value.
    • Method Detail

      • go

        private void go​(java.io.Reader r)
                 throws java.io.IOException
        Does the actual parsing. Perform this immediately after creating the parser object.
        Throws:
        java.io.IOException
      • restoreState

        private int restoreState()
        Gets a state from the stack
        Returns:
        the previous state
      • saveState

        private void saveState​(int s)
        Adds a state to the stack.
        Parameters:
        s - a state to add to the stack
      • flush

        private void flush()
        Flushes the text that is currently in the buffer. The text can be ignored, added to the document as content or as comment,... depending on the current state.
      • initTag

        private void initTag()
        Initialized the tag name and attributes.
      • doTag

        private void doTag()
        Sets the name of the tag.
      • processTag

        private void processTag​(boolean start)
        processes the tag.
        Parameters:
        start - if true we are dealing with a tag that has just been opened; if false we are closing a tag.
      • throwException

        private void throwException​(java.lang.String s)
                             throws java.io.IOException
        Throws an exception
        Throws:
        java.io.IOException
      • parse

        public static void parse​(SimpleXMLDocHandler doc,
                                 SimpleXMLDocHandlerComment comment,
                                 java.io.Reader r,
                                 boolean html)
                          throws java.io.IOException
        Parses the XML document firing the events to the handler.
        Parameters:
        doc - the document handler
        comment - the comment handler
        r - the document. The encoding is already resolved. The reader is not closed
        html -
        Throws:
        java.io.IOException - on error
      • parse

        public static void parse​(SimpleXMLDocHandler doc,
                                 java.io.InputStream in)
                          throws java.io.IOException
        Parses the XML document firing the events to the handler.
        Parameters:
        doc - the document handler
        in - the document. The encoding is deduced from the stream. The stream is not closed
        Throws:
        java.io.IOException - on error
      • getDeclaredEncoding

        private static java.lang.String getDeclaredEncoding​(java.lang.String decl)
      • parse

        public static void parse​(SimpleXMLDocHandler doc,
                                 java.io.Reader r)
                          throws java.io.IOException
        Parameters:
        doc -
        r -
        Throws:
        java.io.IOException
      • escapeXML

        @Deprecated
        public static java.lang.String escapeXML​(java.lang.String s,
                                                 boolean onlyASCII)
        Deprecated.
        moved to XMLUtil.escapeXML(String, boolean), left here for the sake of backwards compatibility
        Escapes a string with the appropriated XML codes.
        Parameters:
        s - the string to be escaped
        onlyASCII - codes above 127 will always be escaped with &#nn; if true
        Returns:
        the escaped string