Home

XML Structure

 

XML Well-Formed

 

Tag Creation

Earlier, we mentioned that XML worked through markups. A simple markup is made of a tag created between the left angle bracket "<" and the right angle bracket ">". Just creating a markup is not particularly significant. You must give it meaning. To do this, you can type a number, a date, or a string on the right side of the right angle bracket ">" symbol. The text on the right side of ">" is referred to as the item's text. It is also called a value.

After specifying the value of the markup, you must close it: this is a rule not enforced in HTML but must be respected in XML to make it "well-formed". To close a tag, use the same formula of creating a tag with the left angle bracket "<", the tag, and the right angle bracket ">" except that, between < and the tag, you must type a forward slash. The formula to use is:

<tag>some value</tag>

The item on the left side of the "some value" string, in this case <tag>, is called the opening or start-tag. The item on the right side of the "some value" string, in this case </tag>, is called the closing or end-tag. Like<tag> is a markup, </tag> also is called a markup.

With XML, you create your own tags with custom names. This means that a typical XML file is made of various items. Here is an example:

<title>The Distinguished Gentleman</title>
	<director>Jonathan Lynn</director><length>112 Minutes</length>

Tag Names

When creating your tags, there are various rules you must observe with regards to their names. Unlike HTML, XML is very restrictive with its rules. For example, unlike HTML but like C/C++/C#, XML is case-sensitive. This means that CASE, Case, and case are three different words. Therefore, from now on, you must pay close attention to what you write inside of the < and the > delimiters.

Besides case sensitivity, there are some rules you must observe when naming the tags of your markups:

  • The name of a tag must be in one word, no space in the name
  • The name must start with an alphabetic letter or an underscore - Examples are <Country> or <_salary>
  • The first letter or underscore that starts a name can be followed by:
    • Letters - Example: <OperatingSystem>
    • Digits - Example: <L153>
    • Hyphens - Example: <TV-Rating>
    • Underscores - Example: <Chief_Accountant>
  • The name of a tag cannot start with xml, XML or any combination of X (uppercase or lowercase), followed by M (uppercase or lowercase), and followed by L (uppercase or lowercase)

In our lessons, here are the rules we will apply:

  • Sometimes a name will be made of lowercase only
  • Sometimes a name will start in uppercase (most of the time) or lowercase
  • When a name is a combination of words, such as [hourly salary], we will start each part in uppercase. Examples will be HourlySalary or DateOfBirth

In future sections, we will learn that, with some markups, you can include non-readable characters between the angle brackets. In fact, you will need to pay close attention to the symbols you type in a markup. We will also see how some characters have special meaning.

The Root

Every XML document must have one particular tag that, either is the only tag in the file, or acts as the parent of all the other tags of the same document. This tag is called the root. Here is an example of a file that has only one tag:

<rectangle>A rectangle is a shape with 4 sides and 4 straight angles</rectangle>

This would produce:

XML in a Browser

If there are more than one tag in the XML file, one of them must serve as the parent or root. Otherwise, you would receive an error. Based on this rule, the following XML code is not valid:

<rectangle>A rectangle is a shape with 4 sides and 4 straight angles</rectangle>
<square>A square is a rectangle whose 4 sides are equal</square>

This would produce:

An ill-formed XML file in a Browser

 To correct this type of error, you can change one of the existing tags to act as the root. In the following example, the <rectangle> tag acts as the parent:

<rectangle>A rectangle is a shape with 4 sides and 4 straight angles
<square>A square is a rectangle whose 4 sides are equal</square></rectangle>

This would produce:

Good Nested Tags

Alternatively, you can create a tag that acts as the parent for the other tags. In the following example, the <geometry> tag acts as the parent of the <rectangle> and of the <square> tags:

<geometry><rectangle>A rectangle is a shape with 4 sides and 4 straight angles
</rectangle><square>A square is a rectangle whose 4 sides are equal</square></geometry>

This would produce:

Preview

As mentioned already, a good XML file should have a Document Type Declaration:

<?xml version="1.0" encoding="utf-8"?><geometry><rectangle>A rectangle 
is a shape with 4 sides and 4 straight angles</rectangle><square>A 
square is a rectangle whose 4 sides are equal</square></geometry>

To give you access to the root of an XML file, the XmlDocument class is equipped with the DocumentElement property.

Practical Learning Practical Learning: Creating the Root Tag

  1. In the students.xml file, click under the top line and type <students>
     
    XML Code
  2. Press Enter
  3. Save the file

The Structure of an XML Tag

 

Empty Tags

We mentioned that, unlike HTML, every XML tag must be closed. We also saw that the value of a tag was specified on the right side of the right angle bracket of the start tag. In some cases, you will create a tag that doesn't have a value or, may be for some reason, you don't provide a value to it. Here is an example:

<dinner></dinner>

This type of tag is called an empty tag. Since there is no value in it, you may not need to provide an end tag but it still must be closed. Although this writing is allowed, an alternative is to close the start tag itself. To do this, between the tag name and the right angle bracket, type an empty space followed by a forward slash. Based on this, the above line can be written as follows:

<dinner />

Both produce the same result or accomplish the same role.

White Spaces

In the above example, we typed various items on the same line. If you are creating a long XML document, although creating various items on the same line is acceptable, this technique can make it (very) difficult to read. One way you can solve this problem is to separate tags with empty spaces. Here is an example:

<title>The Distinguished Gentleman</title> 
	<director>Jonathan Lynn</director>
		<length>112 Minutes</length>

Yet a better solution consists of typing each item on its own line. This would make the document easier to read. Here is an example:

<title>The Distinguished Gentleman</title>
<director>Jonathan Lynn</director>
<length>112 Minutes</length>

All these are possible and acceptable because the XML parser doesn't consider the empty spaces or end of line. Therefore, to make your code easier to read, you can use empty spaces, carriage-return-line-feed combinations, or tabs inserted in various sections. All these are referred to as white spaces.

Nesting Tags

Most XML files contain more than one tag. We saw that a tag must have a starting point and a tag must be closed. One tag can be included in another tag: this is referred to as nesting. A tag that is created inside of another tag is said to be nested. A tag that contains another tag is said to be nesting. Consider the following example:

<Smile>Please smile to the camera</Smile>
<English>Welcome to our XML Class</English>
<French>Bienvenue à notre Classe XML</French>

In this example, you may want the English tag to be nested in the Smile tag. To nest one tag inside of another, you must type the nested tag before the end-tag of the nesting tag. For example, if you want to nest the English tag in the Smile tag, you must type the whole English tag before the </Smile> end tag. Here is an example:

<Smile>Please smile to the camera<English>Welcome to our XML Class</English></Smile>

To make this code easier to read, you can use white spaces as follows:

<smile>Please smile to the camera
<English>Welcome to our XML Class</English>
</smile>

When a tag is nested, it must also be closed before its nesting tag is closed. Based on this rule, the following code is not valid:

<Smile>Please smile to the camera
<English>Welcome to our XML Class
</Smile>
</English>

The rule broken here is that the English tag that is nested in the the Smile tag is not closed inside the Smile tag but outside.

Once you have decided on the structure of your XML file, we save that you can create it in memory using the XmlDocument.LoadXml() method. For example, the following XML code:

<?xml version="1.0" encoding="utf-8"?>
<musiccollection>
	<album>
		<shelfnumber>FJ-7264</shelfnumber>
		<title>Symphony-Bantu</title>
		<artist>Vincent Nguini</artist>
		<copyrightyear>1994</copyrightyear>
		<publisher>Mesa Records</publisher>
	</album>
	<album>
		<shelfnumber>MR-2947</shelfnumber>
		<title>None</title>
		<artist>Debbie Gibson</artist>
		<copyrightyear>1990</copyrightyear>
		<publisher>Atlantic</publisher>
	</album>
</musiccollection>

can be created in memory as follows:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.Xml;

namespace MusicCollection
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void btnDocument_Click(object sender, EventArgs e)
        {
            XmlDocument docMusic = new XmlDocument();
            
            docMusic.LoadXml("<?xml version=\"1.0\" encoding=\"utf-8\"?>" +
                             "<musiccollection><album>" +
                             "<shelfnumber>FJ-7264</shelfnumber>" +
                             "<title>Symphony-Bantu</title>" +
                             "<artist>Vincent Nguini</artist>" +
                             "<copyrightyear>1994</copyrightyear>" +
                             "<publisher>Mesa Records</publisher></album>" +
                             "<album><shelfnumber>MR-2947</shelfnumber>" +
                             "<title>None</title><artist>Debbie Gibson</artist>" +
                             "<copyrightyear>1990</copyrightyear>" +
                             "<publisher>Atlantic</publisher>" +
                             "</album></musiccollection>");
    }
}

Notice that the whole XML code can be created as one line of text and the code would be valid.

Practical Learning Practical Learning: Creating XML

  1. To apply the concept of nesting XML tags, change the students.xml file as follows:
    <?xml version="1.0" encoding="utf-8"?>
    <students>
    	<student>
    		<firstname>Benjamin</firstname>
    		<lastname>Carson</lastname>
    		<dateofbirth>04/10/1995</dateofbirth>
    		<gender>2</gender>
    	</student>
    	<student>
    		<firstname>Gertrude</firstname>
    		<lastname>Simms</lastname>
    		<dateofbirth>8/22/1993</dateofbirth>
    		<gender>1</gender>
    	</student>
    	<student>
    		<firstname>Paul</firstname>
    		<lastname>Sandt</lastname>
    		<dateofbirth>12/24/1997</dateofbirth>
    		<gender>3</gender>
    	</student>
    	<student>
    		<firstname>Chrissie</firstname>
    		<lastname>Burchs</lastname>
    		<dateofbirth>02/06/1993</dateofbirth>
    		<gender>1</gender>
    	</student>
    </students>
  2. Save the file

An XML Node

 

Introduction to XML Nodes

Consider the following example of an XML file named Videos.xml:

<?xml version="1.0" encoding="utf-8" ?>
<Videos>
	<Video>
		<Title>The Distinguished Gentleman</Title>
		<Director>Jonathan Lynn</Director>
		<Length>112 Minutes</Length>
		<Format>DVD</Format>
		<Rating>R</Rating>
	</Video>
	<Video>
		<Title>Her Alibi</Title>
		<Director>Bruce Beresford</Director>
		<Length>94 Mins</Length>
		<Format>DVD</Format>
		<Rating>PG-13</Rating>
	</Video>
	<Video>
		<Title>Chalte Chalte</Title>
		<Director>Aziz Mirza</Director>
		<Length>145 Mins</Length>
		<Format>DVD</Format>
		<Rating>N/R</Rating>
	</Video>
</Videos>

An XML file appears as an upside-down tree: it has a root (in this case <Videos>), it can have branches (in this case <Video>), and it can have leaves (an example in this case is <Title>). As we have seen so far, all of these objects are created using the same technique: a tag with a name (such as <Title>) and an optional value. Based on their similarities, each of these objects is called a node.

To support nodes of an XML file, the .NET Framework provides the XmlNode class, which is the ancestor to all types of nodes. XmlNode is an abstract class without a constructor. Based on this, to get a node, you must have an object that would produce one and you can only retrieve a node from an (existing) object.

Introduction to Node Types

To make XML as complete and as efficient as possible, it can contain various types of nodes. The categories or possible types of nodes are identified by an enumeration named XmlNodeType. If you use an XmlTextReader object to scan a file, when calling Read(), the class has a property named NodeType that allows you to identify the node that was read. NodeType is a read-only property of type XmlNodeType and it is declared as follows:

public override XmlNodeType NodeType { get; }

Therefore, when calling the XmlTextReader.Read() method, you can continuously check the value of the XmlTextReader.NodeType property to find out what type of node was just read, and then you can take an appropriate action.

 

Previous Copyright © 2007-2013, FunctionX Next