Page 85 - Building Digital Libraries

P. 85

CHAPTER 5

XPath

XPath is a methodology for addressing parts of an XML document. XPath
is a technology designed to be utilized with XSLT and XPointer. It defines a
3
syntax by which XML data can be extracted and acted upon. In a conceptual
sense, an XML document is really like a tree, with each element a different
node on the tree. XPath defines a method for accessing the individual nodes
on the tree. For example, consider the following XML snippet:

<?xml version=“1.0” encoding=“utf-8” ?>
<book>
<item>
<title>Pride and Prejudice</title>
<author>Jane Austin</author>
<publication_date>1813</publication_date>
<language>eng</language>
<format>text</format>
</item>
<item>
<title>Pride and Prejudice</title>
<author>Jane Austin</author>
<author type=”screenwriter”>Deborah Moggach</author>
<publication_date>2015</publication_date>
<language>eng</language>
<format>film</format>
</item>
<item>
<title>Pride and Prejudice</title>
<author>Jane Austin</author>
<publication_date>2017</publication_date>
<language>eng</language>
<format>text</format>
</item>
</book>
XPath statements furnish a process to access an individual node within
an XML file by naming its location in relation to the root element. In this
case, a process looking to extract the publication_date and format from the
second item tag group would create an XPath statement that navigated the
document nodes. In this example, however, the node item is not unique—
but it appears multiple times at the same level within the XML document.
XPath accommodates this by allowing access to the item group as elements
of an array. XPath arrays, however, differ from traditional array structures
in that XPath utilizes a state at 1, while an array in PERL, C, or C# would
start at zero. Accessing the second node from our above example would use
the following statement: /item[2]/publication_date, which illustrates how
the data in the second item node would be addressed. When coupled with
XSLT, XPath gives an individual or a process the ability to loop or extract

80 81 82 83 84 85 86 87 88 89 90