In this post I will explain how to parse an xml document in java using the SAX API.
SAX api is event based java api.
The SAX compliant parser which reads the xml document sequentially, generates events whenever the parser encounters an xml element.
The parser reads the document sequentially and doesn’t load the entire xml document in memory.
For every event generated an appropriate pre-determined callback methods are executed.
These methods are encapsulated in handler objects and these handlers are set to the parser before parsing is started.
We obtain the SAX compliant parser as shown below
XMLReader reader = XMLReaderFactory.createXMLReader();
createXMLReader tries to create an instance of SAX compliant parser, if it cannot create an instance or obtain information about SAX compliant parser class name,
it throws SAXException.
Next we create a Handler object by implementing ContentHandler, DTDHandler, EntityResolver, or ErrorHandler interface.
Below is the java code of the class that implement ContentHandler interface
package sax;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
public class ContentParser implements ContentHandler {
private Locator locator;
/**
* This method is called whenever characters are encountered. The parameter ch is an array containing the data
* and start indicate the starting position of the data in the array. length indicates
* the number of characters to be read.
*/
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
System.out.print("characters [");
for(int i = start; i < length; i++) {
System.out.print(ch[i]);
}
System.out.println("]");
}
/**
* Called when end of the document is reached
*/
@Override
public void endDocument() throws SAXException {
System.out.println("End Document" + " at : " + locator.getLineNumber() + ", " + locator.getColumnNumber());
}
/**
* Called when end tag of an element is reached
*/
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.println("End Element: " + localName + " at : " + locator.getLineNumber() + ", " + locator.getColumnNumber());
}
/**
* Called when end of prefix mapping is reached
*/
@Override
public void endPrefixMapping(String prefix) throws SAXException {
System.out.println("End Prefix: " + prefix + " at : " + locator.getLineNumber() + ", " + locator.getColumnNumber());
}
/**
* Receive notification of ignorable whitespace in element content.
*/
@Override
public void ignorableWhitespace(char[] ch, int start, int end) throws SAXException {
System.out.print("ignorable whitespace [");
for(int i = start; i < end; i++) {
System.out.print(ch[i]);
}
System.out.println("]");
}
/**
* Called when processing instruction is encountered
*/
@Override
public void processingInstruction(String target, String data) throws SAXException {
System.out.println("Processing instruction target: " + target + " and data: " + data + " at : " + locator.getLineNumber() + ", " + locator.getColumnNumber());
}
/**
* Setter for implementation of org.xml.sax.Locator interface. The getColumnNumber and getLineNumber method of Locator
* can be used in other methods to retrieve the location For example location where element has been encountered etc.
*/
@Override
public void setDocumentLocator(Locator locator) {
this.locator = locator;
System.out.println("Locator set: " + locator);
}
/**
* Reports all the skipped entities
*/
@Override
public void skippedEntity(String name) throws SAXException {
System.out.println("Skipped Entity: " + name + " at : " + locator.getLineNumber() + ", " + locator.getColumnNumber());
}
/**
* Called at the start of the document
*/
@Override
public void startDocument() throws SAXException {
System.out.println("Start Document" + " at : " + locator.getLineNumber() + ", " + locator.getColumnNumber());
}
/**
* Called when start tag of an element is encountered.
*/
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
System.out.println("Start Element: " + qName + " at : " + locator.getLineNumber() + ", " + locator.getColumnNumber());
}
/**
* Called when start of prefix mapping is encountered
*/
@Override
public void startPrefixMapping(String prefix, String uri) throws SAXException {
System.out.println("Start Prefix: " + prefix + " at : " + locator.getLineNumber() + ", " + locator.getColumnNumber());
}
}
We create an instance of the above class and set to xmlReader before we start parsing the document, as shown below
ContentParser contentParser = new ContentParser();
reader.setContentHandler(contentParser);
Next we start parsing by calling the parse method of XMLReader interface. Below is complete code of the main class
Main Code
package sax;
import java.io.FileInputStream;
import java.io.IOException;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
public class SaxDemo1 {
public static void main(String[] args) throws SAXException, IOException {
FileInputStream fis = new FileInputStream("example1.xml");
InputSource is = new InputSource(fis);
ContentParser contentParser = new ContentParser();
XMLReader reader = XMLReaderFactory.createXMLReader();
reader.setContentHandler(contentParser);
reader.parse(is);
}
}
Below is xml document being parsed by the main code
XML documenth
<?target instructions?>
<employee xmlns:e="http://www.example.org/employee">
<e:id>1</e:id>
<e:firstname>Jason</e:firstname>
<e:lastname>Bourne</e:lastname>
<e:position>Software Engineer</e:position>
<e:location>Fairfax</e:location>
</employee>
Output
Locator set: com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser$LocatorProxy@4e25154f
Start Document at : 1, 1
Processing instruction target: target and data: instructions at : 1, 24
Start Prefix: e at : 2, 53
Start Element: employee at : 2, 53
characters []
Start Element: e:id at : 3, 8
characters []
End Element: id at : 3, 16
characters []
Start Element: e:firstname at : 4, 15
characters []
End Element: firstname at : 4, 34
characters []
Start Element: e:lastname at : 5, 14
characters []
End Element: lastname at : 5, 33
characters []
Start Element: e:position at : 6, 14
characters []
End Element: position at : 6, 44
characters []
Start Element: e:location at : 7, 14
characters []
End Element: location at : 7, 34
characters []
End Element: employee at : 8, 12
End Prefix: e at : 8, 12
End Document at : -1, -1