Sunday, 11 October 2015

XML Parsing

Parsing: Splitting of a full string in to small chunks using some special tokens is known as parsing. For parsing an XML file we have two parser provided:

1.    SAX Parser.
2.    DOM parser.

For using both parser API, JAR file required: Xerces.jar

SAX (Simple API for XML)

SAX parser is read only parser which is used to read the data from XML doc. SAX parser cannot update, delete and generate new xml. It is implemented based on event driven model. At the time of parsing the xml, it will generate the following types of events:

Document is started.
Element is started.
Character data is started.
Element data is ended.
Document is ended. etc.

SAX parser reads the XML data in sequential order.

When ever it found start tag and end tag inside an XML, it fires the corresponding event handler methods.

Basic Example:

Directory Structure in Eclipse






SAXParserTest.java

package com.sky.parse.sax;

import java.io.IOException;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;

public class SAXParserTest {
     
      public static void main(String[] args) {
            XMLReader reader;
            try {
                                                                                                reader=XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");

reader.setContentHandler(new SkyHandler());

reader.parse("src/sky.xml");


            } catch (Exception e) {
                  e.printStackTrace();
            }
           
      }
}

class SkyHandler extends DefaultHandler{
     
      @Override
      public void startDocument() throws SAXException {
            System.out.println("SkyHandler.startDocument()");
      }
     
     
      @Override
      public void startElement(String uri, String localName, String qName,
                  Attributes attributes) throws SAXException {
            System.out.println("SkyHandler.startElement()");
            System.out.println(uri +"\t" + localName +"\t" + qName);
            for (int i = 0; i < attributes.getLength(); i++) {
                  System.out.println(attributes.getLocalName(i) + "\t" + attributes.getValue(i));
            }
      }
     
      @Override
      public void characters(char[] ch, int start, int length)
                  throws SAXException {
            System.out.println("SkyHandler.characters()");
            System.out.println(new String(ch, start, length));
      }
     
      @Override
      public void endElement(String uri, String localName, String qName)
                  throws SAXException {
            System.out.println("SkyHandler.endElement()");
            System.out.println(uri +"\t" + localName +"\t" + qName);
      }
     
      @Override
      public void endDocument() throws SAXException {
            System.out.println("SkyHandler.endDocument()");
      }

}


OUTPUT:

SkyHandler.startDocument()
SkyHandler.startElement()
http://www.sky.com/sky  sky   sky
schemaLocation    http://www.sky.com/sky http://www.sky.com/sky/sky3.xsd http://www.sky.com/sky/emp http://www.sky.com/sky/emp/sky2.xsd http://www.sky.com/sky/dept http://www.sky.com/sky/dept/sky.xsd
SkyHandler.characters()



SkyHandler.startElement()
http://www.sky.com/sky/dept   hai   dept:hai
SkyHandler.endElement()
http://www.sky.com/sky/dept   hai   dept:hai
SkyHandler.characters()


SkyHandler.startElement()
http://www.sky.com/sky/emp    hello emp:hello
SkyHandler.endElement()
http://www.sky.com/sky/emp    hello emp:hello
SkyHandler.characters()



SkyHandler.endElement()
http://www.sky.com/sky  sky   sky
SkyHandler.endDocument()

DONE



DOM (Document Object Model)

DOM parser is read-write parser which is used to both read the data from XML doc and write the data in to an xml document. DOM parser can read, update, delete and generate xml. While reading the xml file DOM parser construct the Node Tree also called as DOM tree with all the elements of XML.

Various kind of node of the DOM trees are:
Document Node.
Element Node.
Attribute Node.
Character Node.

Example:
<sky>
<Employee empid=”101”>
<name>Sumit Kumar</name>
</Employee>
</sky>

DOM Tree:



DOM parser reads the XML data in Random order.

Once parsing is completed with DOM parser, entire XML data will be loaded in to Main Memory.

Hence when using DOM parser make sure XML file must not be too large other wise it will lead to OutOfMemoryError:java heap space in java

Basic Example: Please click on below links
Difference between SAX and DOM parser

S.No
SAX
DOM
1
 Read Only Parser
Read-Write Parser
2
 It reads data sequentially
It reads data randomly.
3
It occupies less memory because it loads only one element information at a time in to memory.
It occupies more memory because it loads whole xml data in to memory once.
4.
 It follows event driven model.
It follows Object Driven model.





No comments:

Post a Comment