Parsing xml document using DOM api

In this post I will explain how to parse a xml document using DOM api.

The DOM api can also be used to create xml documents in addition to reading.

The DOM compliant parser loads the entire document in memory for reading it.

Once the entire xml document is loaded in the memory, we can use the DOM api to read the contents.

When a document is read, a tree structure is created in memory, where the tree has root node and several child nodes and the last child node is called the leaf node.

The tree also stores information that contains additional information regarding elements.

To parse an document we need to obtain a parser which is as shown below


    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();

Where instance of DocumentBuilder is the parser. It is called DocumentBuilder because of its dual role as reader and writer.

We first create an instance of DocumentBuilderFactory by calling newInstance method.

Then we create an instance of DocumentBuilder by calling newDocumentBuilder method on documentBuilderFactory instance.

Then we call parse method (as shown below) on the instance of DocumentBuilder to read the xml document and create a tree structure.


    Document doc = documentBuilder.parse(new File("example2.xml"));

Where doc represents the root node in the tree.

All the nodes in the tree implement the org.w3c.dom.Node interface. So using the methods provided by the interface we can read the xml document.

Below is the complete main code

Main Code


1  package dom;
2  
3  import java.io.File;
4  import java.io.IOException;
5  
6  import javax.xml.parsers.DocumentBuilder;
7  import javax.xml.parsers.DocumentBuilderFactory;
8  import javax.xml.parsers.ParserConfigurationException;
9  
10 import org.w3c.dom.Document;
11 import org.w3c.dom.NamedNodeMap;
12 import org.w3c.dom.Node;
13 import org.w3c.dom.NodeList;
14 import org.xml.sax.SAXException;
15 
16 public class DOMDemo1 {
17  public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException {
18      DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
19      DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder(); 
20      Document doc = documentBuilder.parse(new File("example2.xml"));
21      
22      DOMDemo1 domDemo1 = new DOMDemo1();
23      domDemo1.echo(doc.getFirstChild());
24  }
25  
26  public void echo(Node n) {
27      int type = n.getNodeType();
28 
29      switch (type) {
30          case Node.ATTRIBUTE_NODE:
31              System.out.println("ATTR:" + n.getNodeName());
32              break;
33 
34          case Node.CDATA_SECTION_NODE:
35              System.out.println("CDATA:" + n.getNodeName());
36              break;
37 
38          case Node.COMMENT_NODE:
39              System.out.println("COMM:" + n.getNodeName());
40              break;
41 
42          case Node.DOCUMENT_NODE:
43              System.out.println("DOC:" + n.getNodeName());
44              break;
45 
46          case Node.ELEMENT_NODE:
47              System.out.println("ELEM:" + n.getNodeName());
48 
49              NamedNodeMap atts = n.getAttributes();
50              for (int i = 0; i < atts.getLength(); i++) {
51                  Node att = atts.item(i);
52                  echo(att);
53              }
54              if(n.hasChildNodes()) {
55                  NodeList nodeList = n.getChildNodes();
56                  for(int i = 0; i < nodeList.getLength(); i++) {
57                      Node node = nodeList.item(i);
58                      echo(node);
59                  }
60              }
61              break;
62 
63          case Node.TEXT_NODE:
64              System.out.println("TEXT:" + n.getNodeName() + ":" + n.getNodeValue());
65              break;
66 
67          default:
68              System.out.println("UNSUPPORTED NODE: " + type);
69              break;
70      }
71  }
72 }

In the echo method we first get the type of the node by calling getNodeType method.

The getNodeType returns a number which is compared against the constants declared in the Node interface. Based on the node type we perform appropriate action.

There is one constant for each type of node present in the xml document.

The xml document parsed for this example is

xml document


<employees xmlns:e="http://www.example.org/employee">
    <e:employee>
        <e:id>1</e:id>
        <e:firstname>Jason</e:firstname>
        <e:lastname>Bourne</e:lastname>
        <e:position>Software Engineer</e:position>
        <e:location>Fairfax</e:location>
    </e:employee>
    <e:employee>
        <e:id>2</e:id>
        <e:firstname>Jason</e:firstname>
        <e:lastname>Bourne</e:lastname>
        <e:position>Software Engineer</e:position>
        <e:location>Fairfax</e:location>
    </e:employee>
</employees>

Output

ELEM:employees
ATTR:xmlns:e
TEXT:#text:

ELEM:e:employee
TEXT:#text:

ELEM:e:id
TEXT:#text:1
TEXT:#text:

ELEM:e:firstname
TEXT:#text:Jason
TEXT:#text:

ELEM:e:lastname
TEXT:#text:Bourne
TEXT:#text:

ELEM:e:position
TEXT:#text:Software Engineer
TEXT:#text:

ELEM:e:location
TEXT:#text:Fairfax
TEXT:#text:

TEXT:#text:

ELEM:e:employee
TEXT:#text:

ELEM:e:id
TEXT:#text:2
TEXT:#text:

ELEM:e:firstname
TEXT:#text:Jason
TEXT:#text:

ELEM:e:lastname
TEXT:#text:Bourne
TEXT:#text:

ELEM:e:position
TEXT:#text:Software Engineer
TEXT:#text:

ELEM:e:location
TEXT:#text:Fairfax
TEXT:#text:

TEXT:#text:

Leave a Reply