SAX Parser

Overview of SAX

SAX (Simple API for XML) is based on an event model. The SAX parser reads an XML document from the beginning and fires off events when it finds a new element, attribute, piece of text or another component.

Advantages: They are usually very fast. SAX uses less processing power and memory.

Disadvantages:  SAX does not give access to the entire object model of an XML document. Doesn't have the possibility to write documents.

When to use: State independent processing, when an element does not depend on the elements that came before it.

General steps to work with SAX parsers:

  1. Create an instance of SAXParser.
  2. Register a content handler class, which will process the XML document.
  3. Begin the process of parsing and wait for the invoking of callback functions.

Overview of SAX API

SAXParserFactory

A SAXParserFactory object creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory.

SAXParser

The SAXParser interface has several types of parse() methods. The overloaded parse method receives an XML data source and a DefaultHandler object and the parser processes the XML and calls the appropriate methods in the handler object.

DefaultHandler

A DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces.

ContentHandler

The startDocument, endDocument, startElement, and endElement methods are called when an XML tag is found. The interface also defines the methods characters() and processingInstruction(), which are called when the parser recognizes the text in an XML element or an inline processing instruction.

ErrorHandler

Methods error(), fatalError(), and warning() are called in response to different parsing errors. The default error handler throws an exception for fatal errors and ignores other errors (including validation errors). 

DTDHandler

Used for processing a DTD to recognize and act on declarations for an unparsed entity.

EntityResolver

The resolveEntity method is called when the parser should recognize data identified by a URI. In most cases, a URI is simply a URL, which specifies the location of a document, but in some cases, the document may be identified by a URN - a public identifier, or name, that is unique in the webspace. The public identifier may be specified in addition to the URL. The EntityResolver can then use the public identifier instead of the URL to find the document - for example, to access a local copy of the document if one exists.

Create an instance of SAXParser

Example 1

import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

public class MySaxParser1 {
    public static void main(String[] args) {
        try {
            //1. Create an instance of SAXParser
            SAXParserFactory factory = SAXParserFactory.newInstance();
            SAXParser saxParser = factory.newSAXParser();
            //Use a validating parser
            factory.setValidating(true);
            //switch off namespace validation
            factory.setNamespaceAware(false);
            //2. Register a content handler class MyHandler(will be discussed later)
            //3. Parse the document
            saxParser.parse("content.xml", new MyHandler());
        } catch (ParserConfigurationException e) {
            System.out.println("Parser configuration exception.");
        } catch (FactoryConfigurationError e) {
            System.out.println("Factory configuration error.");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

In Example 1 a vendor-neutral factory class javax.xml.parsers.SAXParserFactory is used to create javax.xml.parsers.SAXParser instance:

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();

The SAXParserFactory can be set up to use a validating parser and don't be namespace-aware:

factory.setValidating(true);
factory.setNamespaceAware(false);

The SAXParser class works as the wrapper for XML SAX parser. It gives the possibility don't bother about some specific functions of the parser. 

saxParser.parse("content.xml", new MyHandler());

Method parse of SAX parser registers MyHandler class as its content handler, and parse the document. The parse method is overloaded to receive documents in different forms: as SAX InputSource, Java InputStream, or URL with DefaultHandler.

Example 2

Sometimes it is necessary to receive SAX parser (instance of org.xml.sax.XMLReader) directly. It gives access to the usual SAX methods:

import org.xml.sax.XMLReader;

import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

public class MySaxParser2 {
    public static void main(String[] args) {
        try {
            SAXParserFactory factory = SAXParserFactory.newInstance();
            SAXParser saxParser = factory.newSAXParser();
            XMLReader parser = saxParser.getXMLReader();
            //Register a content handler class MyHandler(will be discussed later)
            parser.setContentHandler(new MyHandler());
            parser.parse(new org.xml.sax.InputSource("content.xml"));
        } catch (ParserConfigurationException e) {
            System.out.println("Parser configuration exception.");
        } catch (FactoryConfigurationError e) {
            System.out.println("Factory configuration error.");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Example 3

And finally, let's look at the last the most simple example of receiving SAX parser. In this example, we don't use javax.xml.parsers.SAXParserFactory and javax.xml.parsers.SAXParser but receive XMLReader directly from XMLReaderFactory.  

import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;

import javax.xml.parsers.FactoryConfigurationError;

public class MySaxParser3 {
    public static void main(String[] args) {
        try {
            XMLReader parser = XMLReaderFactory.createXMLReader();
            //Register a content handler class MyHandler(will be discussed later)
            parser.setContentHandler(new MyHandler());
            parser.parse(new InputSource("content.xml"));
        } catch (FactoryConfigurationError e) {
            System.out.println("Factory configuration error.");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

You can ask what is the difference between this implementation (Example 3) and the previous two (Example1 and Example 2). Why cannot we use Example 3 if the code is more simple? The SAXParserFactory hides details of vendor parser factory from you. If you want to have the possibility to replace the default JAXP Parser with a different JAXP Parser implementation you should use the vendor-neutral SAXParserFactory. If the application will always use a SAX2 XMLReader, the XMLReaderFactory can be used.

The ContentHandler and DefaultHandler Interfaces

The ContentHandler is the primary listener interface in SAX:

ContentHandler Interface Photo

In most cases, developers need to implement only a part of these methods. SAX provides an adapter class, org.xml.sax.helpers.DefaultHandler, which implements ContentHandler with empty methods. You can extend DefaultHandler, overriding only the methods you want to implement, and ignore the rest.

Example 4 DefaultHandler SAX Parser Example

Let's view the implementation of the DefaultHandler which is used in Examples 1-3:

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class MyHandler extends DefaultHandler {
    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        System.out.println(qName);
    }
}

This content handler implements only one method - startElement(). Each time the SAX parser encounters the start tag of a new element, it will call this class's startElement() method. MyHandler simply prints out each element's local name, the tag name without the prefix.

Consider the following XML code snippet:

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope
        xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        xmlns:ex="http://www.ec.com/EX/">
    <soap:Body>
        <ex:exam>
            <name xsi:type="xsd:string">JAXP</name>
        </ex:exam>
    </soap:Body>
</soap:Envelope>

The MyHandler analyzing this XML document will generate the following:

soap:Envelope
soap:Body
ex:exam
name
Read also:
Trustpilot
Trustpilot
Comments