XML stands for eXtensible Markup Language. XML is a grammatical system for constructing custom markup languages for describing business data, mathematical data, chemical data etc. XML loosely couples disparate applications or systems utilizing JMS, Web services etc. XML uses the same building blocks that HTML does: elements, attributes and values.
Why is XML important ?
Scalable: Since XML is not in a binary format you can create and edit files with anything and itís also easy to debug. XML can be used to efficiently store small amounts of data like configuration files (web.xml, application.xml, strutsconfig. xml etc) to large company wide data with the help of XML stored in the database.
Fast Access: XML documents benefit from their hierarchical structure. Hierarchical structures are generally faster to access because you can drill down to the section you are interested in.
Easy to identify and use: XML not only displays the data but also tells you what kind of data you have. The mark up tags identifies and groups the information so that different information can be identified by different application.
Stylability: XML is style-free and whenever different styles of output are required the same XML can be used with different style-sheets (XSL) to produce output in XHTML, PDF, TEXT, another XML format etc.
Linkability, in-line usability, universally accepted standard with free/inexpensive tools etc
When would you not use an XML ?
XML is verbose and it can be 4-6 times larger in size compared to a csv or a tab delimited file. If your network lacked bandwidth and/or your content is too large and network throughput is vital to the application then you may consider using a csv or tab delimited format instead of an XML.
What is the difference between a SAX parser and a DOM parser?
A SAX (Simple API for XML) parser does not create any internal structure. Instead, it takes the occurrences of components of an input document as events (i.e., event driven), and tells the client what it reads as it reads through the input document.
A SAX parser serves the client application always only with pieces of the document at any given time.
A SAX parser, however, is much more space efficient in case of a big input document (because it creates no internal structure). What's more, it runs faster and is easier to learn than DOM parser because its API is really simple. But from the functionality point of view, it provides a fewer functions, which means that the users themselves have to take care of more, such as creating their own data structures.
Use SAX parser when
Input document is too big for available memory.
When only a part of the document is to be read and we create the data structures of our own.
If you use SAX, you are using much less memory and performing much less dynamic memory allocation.
SAX Parser example: Xerces, Crimson etc Use JAXP (Java API for XML Parsing) which enables applications to parse and transform XML documents independent of the particular XML parser. Code can be developed with one SAX parser in mind and later on can be changed to another SAX parser without changing the application code.
A DOM (Document Object Model) parser creates a tree structure in memory from an input document and then waits for requests from client.
A DOM parser always serves the client application with the entire document no matter how much is actually needed by the client.
A DOM parser is rich in functionality. It creates a DOM tree in memory and allows you to access any part of the document repeatedly and allows you to modify the DOM tree. But it is space inefficient when the document is huge, and it takes a little bit longer to learn how to work with it.
Use DOM when
Your application has to access various parts of the document and using your own structure is just as complicated as the DOM tree.
Your application has to change the tree very frequently and data has to be stored for a significant amount of time.
DOM Parser example: XercesDOM, SunDOM, OracleDOM etc.
Use JAXP (Java API for XML Parsing) which enables applications to parse and transform XML documents independent of the particular XML parser. Code can be developed with one DOM parser in mind and later on can be changed to another DOM parser without changing the application code.
Which is better to store data as elements or as attributes ?
A question arising in the mind of XML/DTD designers is whether to model and encode certain information using an element, or alternatively, using an attribute. The answer to the above question is not clear-cut. But the general guideline is:
Using an element: <book><title>Lord of the Rings</title>...</book> : If you consider the information in question to be part of the essential material that is being expressed or communicated in the XML, put it in an element
Using an attribute: <book title="Lord of the Rings"/> : If you consider the information to be peripheral or incidental to the main communication, or purely intended to help applications process the main communication, use attributes.
The principle is data goes in elements and metadata goes in attributes. Elements are also useful when they contain special characters like "<", ">", etc which are harder to use in attributes. The most important reason to use element is its extensibility. It is far easier to create child elements to reflect complex content than to break an attribute into pieces. You can use attributes along with elements to refine your understanding of that element with extra information. Attributes are less verbose but using attributes instead of child elements with the view of optimizing document size is a short term strategy, which can have long term consequences.