1: Introduction to XPath

XPath Fundamentals

Introduction to XPath

If you have used XML in Microsoft Windows, you may know that, to support XML in its operating systems, Microsoft created a library named MSXML. To make the operations even easier, the MSXML library was integrated in the .NET Framework. If you have used XML before, you may know how chalenging it sometimes is to locate an element in an XML document. To assist you with operations related to locating a node, the W3C created a language named XPath.

Practical Learning: Introducing XPath Applications

Start Microsoft Visual Studio 2022
In the Visual Studio 2022 dialog box, click Create a New Project
In the Create a New Project wizard page, in the languages combo box, select C#
In the list of the projects templates, click Windows Forms App
Click Next
In the Configure Your New Project dialog box, change the project Name to Exercise1
Accept or change the project Location
Click Next
In the Framework combo box, select the highest version: .NET .0 (Standard Term Support)
Click Create
Double-click an unoccupied area of the form to generate its Load event
To execute, on the main menu, click Debug -> Start Without Debugging
Close the form and return to your programming environment

Author Note

If you want, using the form of the application you created, you can apply the descriptions in the following sections to experiment with, and get, the same results.

Introduction to XPath and .NET

Microsoft supports XPath in various ways, as a language in its own right, in the MSXML library, and in the .NET Framework. In this series of lessons, we will use or address XPath as it is defined by W3C and as it is, or can be, used in the .NET Framework or in a .NET Framework-based application.

To support XPath, the .NET Framework provides a namespace named System.Xml.XPath. That namespace contains various classes. As you should know already, the .NET Framework primarily supports the XML standards through the System.Xml namespace that contains such valuable classes as XmlDocument, XmlElement, XmlNode, and XmlNodeList (and many other important XML-based classes). The XmlDocument class gives you access to the whole contents of an XML document, starting with the root node, which is represented by the DocumentElement property.

As you may know already, to open an XML file, you can pass its path to the Load() method of the XmlDocument class. Here is an example:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void btnVideos_Click(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load("../../../Videos.xml");
        }
    }
}

If the path is invalid or the file cannot be found, the compiler would throw a FileNotFoundException exception:

If the path is invalid or the file cannot be found, the compiler would throw a FileNotFoundException exception

Otherwise, that is, if the file could be found, you can use the document as you see fit. For example, you can access its root through the DocumentElement property:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void btnVideos_Click(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load("../../../Videos.xml");

            XmlElement xeVideo = xdVideos.DocumentElement;
        }
    }
}

From that root, you can "scan" (or navigate) the XML document for any reason. The XmlDocument.DocumentElement property is of type XmlElement, which is derived from XmlLinkedNode, itself a child of the XmlNode class.

An XPath Expression

Introduction

As you have used operating systems before, you are familiar with the way the address of a file is formulated. In Microsoft Windows, an example of the location of a folder is:

H:\C# Programming\Lessons\

This gives you access to the Lessons sub-folder that is a child of the Microsoft C# Programming folder created in the H drive. An example of the location of a file is:

H:\C# Programming\Lessons\Lesson01.htm

This gives you access to the Lesson01.htm file located in the Lessons sub-folder that is a child of the C# Programming folder created in the H drive. You can pass such a location to a method of a class, for example if you are performing file processing using one of the classes of the System.IO namespace. A file location is considered an expression. Such an expression contains words (for example Lessons) and operators (such as . or : or \). In the same way, the XPath language uses expressions to specify a path. XPath uses expressions. Internally, there is a program called the XPath parser, or just the parser. That parser receives the expression and analyzes it. As mentioned already, in our lessons, we will use XPath in C# applications. Therefore, when the parser has finished doing its job, it sends its report to the C# compiler.

To give you the ability to use XPath, the XmlNode class is equipped with a method named SelectNodes. This method is overloaded with two versions. The syntax of one of the versions is:

public XmlNodeList SelectNodes(string xpath);

As you can see, the XmlNode.SelectNodes() method takes an XPath expression passed as a string. This can be illustrated as follows:

XPath

If the XmlNode.SelectNodes() method succeeds in what it is supposed to do, it returns a name, a value, a string, or a list of nodes, as an XmlNodeList collection. This can be done as follows:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void btnVideos_Click(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load(XML File);

            XmlElement xeVideo = xdVideos.DocumentElement;
            XmlNodeList xnlVideos = xeVideo.SelectNodes(XPath Expression);    
        }
    }
}

Remember that the XmlNodeList class starts as follows:

public abstract class XmlNodeList : IEnumerable, 
				    IDisposable

Since XPath is a language, it has rules that an expression must follow. An expression that follows the rules is referred to as well-formed. Because we will use XPath in C#, two rules will apply: first, the rules of the XPath language, followed by the rules of the C# language. The primary rule of XPath is that the expression you formulate must be valid (be well-formed). If the expression doesn't follow the rules or the expression violates a rule (that is, if the expression is not well-formed, if you didn't follow the XPath rules), its parser concludes that the expression is not valid. The parser stops any processing and sends a report to the C# compiler. The C# compiler would not perform any further processing and it would throw an exception named XPathException. Here is an example of the compiler displaying an error:

If the path is invalid or the file cannot be found, the compiler would throw a FileNotFoundException exception

The XPathException class is defined in the System.Xml.XPath namespace. If you want, you can catch that exception and take appropriate measures (of course, this class has a Message property).

If the XPath expression is valid, the parser hands the job to another program referred to as an interpreter. Its job is to produce the result requested by the XPath expression. The interpreter starts "scanning" the document that was initiated by the XmlDocument.DocumentElement object. We will see various types of operations that can be requested. For example, you may ask the interpreter to look for a certain element. Another operation may consist of comparing two values. The interpreter checks the document from beginning to end. If it doesn't find any element that responds to the expression, the interpreter sends a message to the C# compiler that no result was found. In this case, the XmlNode.SelectNodes() method returns null.

Starting from the beginning of the document, if the interpreter finds a node that responds to the XPath expression, it adds it to its list of results and continues checking the nodes in the document. When it reaches the end of the document, it gets its final list and sends it to the C# compiler. The compiler stores that list in an XmlNodeList collection and makes it the returned list of the XmlNode.SelectNodes() method. You can then use that list as you see fit:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void btnVideos_Click(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load(XML File);

            XmlElement xeVideo = xdVideos.DocumentElement;
            XmlNodeList xnlVideos = xeVideo.SelectNodes(XPath Expression);   

            // You can now use the XmlNodeList value as you see fit
        }
    }
}

Since the XmlNodeList class holds a collection, you can use a for or a foreach loop to "scan" it. Consider the following XML document whose file is named Videos.xml:

<?xml version="1.0" encoding="utf-8"?>
<videos>
  <video>
    <title>Her Alibi</title>
    <director>Bruce Beresford</director>
    <length>94</length>
    <format>DVD</format>
    <rating>PG-13</rating>
  </video>
  <video>
    <title>The War of the Roses</title>
    <director>Danny DeVito</director>
    <cast-members>
      <actor>Michael Douglas</actor>
      <actor>Kathleen Turner</actor>
      <actor>Danny DeVito</actor>
    </cast-members>
  </video>
  <video>
    <title>The Distinguished Gentleman</title>
    <director>Jonathan Lynn</director>
    <cast-members>
      <actor>Eddie Murphy</actor>
      <actor>Lane Smith</actor>
      <actor>Sheryl Lee Ralph</actor>
      <actor>Joe Don Baker</actor>
      <actor>Victoria Rowell</actor>
    </cast-members>
    <cast-members>
      <actor>Charles S. Dutton</actor>
      <actor>Grant Shaud</actor>
      <actor>Kevin McCarthy</actor>
      <actor>Victor Rivers</actor>
      <actor>Chi McBride</actor>
      <actor>Noble Willingham</actor>
    </cast-members>
    <length>112</length>
    <format>DVD</format>
    <rating>R</rating>
    <year-released>1992</year-released>
    <categories>
      <genre>Comedy</genre>
      <genre>Politics</genre>
      <keywords>
        <keyword>satire</keyword>
        <keyword>government</keyword>
        <keyword>con artist</keyword>
        <keyword>lobbyist</keyword>
        <keyword>election</keyword>
      </keywords>
    </categories>
  </video>
  <video>
    <title>Duplex</title>
    <director>Danny DeVito</director>
    <cast-members>
      <narrator>Danny DeVito</narrator>
    </cast-members>
  </video>
  <video>
    <title>The Day After Tomorrow</title>
    <director>Roland Emmerich</director>
    <length>124</length>
    <categories>
      <genre>Drama</genre>
      <genre>Environment</genre>
      <genre>Science Fiction</genre>
    </categories>
    <format>BD</format>
    <rating>PG-13</rating>
    <keywords>
      <keyword>climate</keyword>
      <keyword>global warming</keyword>
      <keyword>disaster</keyword>
      <keyword>new york</keyword>
    </keywords>
  </video>
  <video>
    <title>Other People&#039;s Money</title>
    <director>Alan Brunstein</director>
    <year-released>1991</year-released>
    <cast-members>
      <actor>Danny DeVito</actor>
      <actor>Gregory Peck</actor>
      <actor>Penelope Ann Miller</actor>
    </cast-members>
    <cast-members>
      <actor>Dean Jones</actor>
      <actor>Piper Laurie</actor>
    </cast-members>
    <categories>
      <genre>Comedy</genre>
      <keywords>
        <keyword>satire</keyword>
        <keyword>female stocking</keyword>
        <keyword>seduction</keyword>
      </keywords>
      <genre>Business</genre>
      <keywords>
        <keyword>capitalism</keyword>
        <keyword>corporate take-over</keyword>
        <keyword>factory</keyword>
        <keyword>speech</keyword>
        <keyword>public speaking</keyword>
      </keywords>
      <genre>Drama</genre>
      <keywords>
        <keyword>play</keyword>
        <keyword>hostile take-over</keyword>
        <keyword>corporate raider</keyword>
      </keywords>
    </categories>
  </video>
</videos>

The Root

As you may know already, every XML document starts with a root node. If you don't know the name of the root (this could happen if you are using an XML file created by someone else and you are not familiar with the document's structure), you can pass the argument to XmlElement.SelectNodes(string xpath) method as /. Here is an example:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void btnVideos_Click(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load("../../../Videos.xml");

            XmlElement xeVideo = xdVideos.DocumentElement;
            XmlNodeList xnlVideos = xeVideo.SelectNodes("/");
        }
    }
}

Both the XPath language and the .NET Framework provide various means to present the result of an XPath expression. As we will see in the next sections, the XPath language provides various operators such as the forward slash / to specify the path, or where to start considering the path, to a node. On the other hand, the XmlNode class is equipped with the InnerText, the InnerXml, and the OuterXml properties that produce various results as we will see.

As you should know already, the XmlNode.InnerText property produces the value of an element. The XmlNode.InnerXml property produces the tag (start and end) and the name of an element. The XmlNode.OuterXml property produces a node and its XML format, including the child nodes, if any, of the element. Here is an example that uses the XmlNode.OuterXml property:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void btnVideos_Click(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load("../../../Videos.xml");

            XmlNodeList xnlVideos = xdVideos.DocumentElement!.SelectNodes("/")!;

            foreach(XmlNode xnVideo in xnlVideos)
            {
                MessageBox.Show(xnVideo.OuterXml,
	  		                    "Video Collection",
                                MessageBoxButtons.OK,
                                MessageBoxIcon.Information);
            }
        }
    }
}

This would produce:

XPath Root

An alternative to access the root node is to pass /* as a string to XmlElement.SelectNodes(). Here is an example:

XmlNodeList xnlVideos = xeVideo.SelectNodes("/*");

One more alternative is to pass the name of the root preceded by /. Here is an example:

XmlNodeList xnlVideos = xeVideo.SelectNodes("/videos");

This would produce the same result as above. Notice that the whole document is treated as one object (the result includes only one Result =).

Accessing the Child Nodes of the Root

In an XML document, the root can have 0 or more child nodes. To access the nodes that are direct children of the root, pass the expression as /RootName/ChildOfRootName. Here is an example:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void btnVideos_Click(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

    	    xdVideos.Load("../../../Videos.xml");

            XmlElement xeVideo = xdVideos.DocumentElement!;
            XmlNodeList xnlVideos = xeVideo.SelectNodes("/videos/video")!;

    	    Action ShowVideos = () =>
    	    {
                foreach(XmlNode xnVideo in xnlVideos)
                {
	                MessageBox.Show("Video\n" + xnVideo.OuterXml,
			                "Video Collection",
			                MessageBoxButtons.OK,
			                MessageBoxIcon.Information);
                }
            };

            ShowVideos();
        }
    }
}

This would produce:

Accessing the Child Nodes of the Root

Notice that this time, each child node of the root is considered an item of the resulting collection.

Accessing Other Children

To access a child node whose location is clear, start from the root and list the ancestry, separating the items by /. Here is an example:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void btnDirectors_Click(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load("../../../Videos.xml");

            XmlElement xeVideo = xdVideos.DocumentElement!;
            XmlNodeList xnlVideos = xeVideo!.SelectNodes("/videos/video/director")!;

            foreach (XmlNode xnVideo in xnlVideos)
            {
                lbxDirectors.Items.Add(xnVideo.InnerText);
            }
        }
    }
}

This would produce:

Accessing Other Children

You can use this technique to locate a node. You can check the results to find a particular node you are looking for.

Accessing the Child Nodes of an Element

A child is a node that appears immediately down after a node in the ancestry lineage. To access the child nodes of an element, follow its name with /*. With this operator, if the immediate child node of the element is:

A simple node made of a name and value, only the value of that node is included in the produced result
A node that itself has child nodes
If you use code that produces only the values (such as theXmlNode.InnerText property), the result would include the values of the child nodes all treated as one combined object. An example would be DramaEnvironmentScience Fiction
If you use code that produces the nodes as objects (such as the XmlNode.InnerXml property), the result would include the whole XML code of its child nodes as a combined object. An example would be <genre>Drama</genre><genre>Environment</genre><genre>Science Fiction</genre>

Here is an example:

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void btnVideos_Click(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load("../../../Videos.xml");

            XmlElement xeVideo = xdVideos.DocumentElement!;
            XmlNodeList xnlVideos = xeVideo.SelectNodes("//video")!;

            string[] videos = new string[xnlVideos.Count];

            for (int i = 0; i < xnlVideos.Count; i++)
            {
                videos[i] = xnlVideos[i]!.OuterXml;
            }

            rtbVideos.Lines = videos;
        }
    }
}

This would produce:

As stated already, if the last element of the expression you passed includes a child node that itself has child nodes, all those child nodes would be combined and produced as one object. If you want to get the individual nodes, pass their name after that of the element. Here is an example:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }    

        private void btnCategories_Click(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load("../../../Videos.xml");

            XmlNodeList xnlVideos = xdVideos.DocumentElement!.SelectNodes("//video/categories/*")!;

            foreach (XmlNode xnVideo in xnlVideos)
            {
                lbxCategories.Items.Add(xnVideo.InnerText);
            }
        }
    }
}

This would produce:

Accessing the Child Nodes of an Element

Accessing the Grand-Children of a Node

A grand-child is a node that appears down after the child node in the ancestry lineage. To access the grandchildren of a node, separate its name and that of the grand-child name with /*/. Here is an example:

XmlNodeList xnlVideos = xeVideo.SelectNodes("/videos/*/director");

This example is based on the root and it produces the same result as seen above. Otherwise, if you are starting from another level, make sure you specify it. Here is an example:

XmlNodeList xnlVideos = xeVideo.SelectNodes("/videos/video/*/actor");

Accessing Specific Nodes

If you have many elements that share the same name in your XML document, to access those elements, pass their name preceded by //. Here is an example:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void btnDirectors_Click(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load("../../../Videos.xml");

            XmlElement xeVideo = xdVideos.DocumentElement!;
            XmlNodeList xnlVideos = xeVideo.SelectNodes("//director")!;

            foreach (XmlNode xnVideo in xnlVideos)
            {
                lbxDirectors.Items.Add(xnVideo.InnerText);
            }
        }
    }
}

This would produce the same result as above. You can also precede // with a period as follows:

XmlNodeList xnlVideos = xeVideo.SelectNodes(".//director");

Consider the following example:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void btnVideos_Click(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load("../../../Videos.xml");

            XmlNodeList xnlVideos = xdVideos.DocumentElement!.SelectNodes("//cast-members")!;

            foreach (XmlNode xnVideo in xnlVideos)
            {
                MessageBox.Show(xnVideo.OuterXml,
                                "Video Collection",
                                MessageBoxButtons.OK, MessageBoxIcon.Information);
            }
        }
    }
}

This would produce:

Accessing Specific Nodes

Notice that the results in each section, those belonging to the same parent node, are treated as one object. If you pass the common name of the nodes that are at the end of their ancestry, they would be treated individually. Consider the following example:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void btnVideos_Click(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load("../../../Videos.xml");

            XmlNodeList xnlVideos = xdVideos.DocumentElement!.SelectNodes("//actor")!;

            foreach (XmlNode xnVideo in xnlVideos)
            {
                lbxActors.Items.Add(xnVideo.InnerText);
            }
        }
    }
}

This would produce:

Accessing Specific Nodes

In the same way, you can use the // operator to specify where to start considering a path to a child or grand-child node. Here is an example:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void Exercise_Load(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load("../../../Videos.xml");

            XmlNodeList xnlVideos = xdVideos.DocumentElement!.SelectNodes("//video/*/actor")!;

            foreach (XmlNode xnVideo in xnlVideos)
            {
                lbxActors.Items.Add(xnVideo.InnerText);
            }
        }
    }
}

In the same way, you can combine operators (separators) to get to a node. For example, to get the child of a node X that itself is a grandchild, simple follow that X node with / and the name of the child. Here is an example:

using System.Xml;

namespace VideoCollection1
{
    public partial class Exercise : Form
    {
        public Exercise()
        {
            InitializeComponent();
        }

        private void Exercise_Load(object sender, EventArgs e)
        {
            XmlDocument xdVideos = new XmlDocument();

            xdVideos.Load("../../../Videos.xml");

            XmlNodeList xnlVideos = xdVideos.DocumentElement!.SelectNodes("//videos/*/cast-members/actor")!;

            foreach (XmlNode xnVideo in xnlVideos)
            {
                lbxActors.Items.Add(xnVideo.InnerText);
            }
        }
    }
}

Practical Learning: Ending the Lesson

Close your programming environment


Home	Copyright © 2014-2024, FunctionX	Monday 24 June 2024, 10:38	Next