Page 181 - Building Digital Libraries
P. 181
CHAPTER 7
While XSLT was designed as a stylesheet language for XML (much
like CSS was designed for HTML), XQuery was created as a full-featured
programming language and was designed for the search, retrieval, and
manipulation of large sets of XML data. Functionally, XSLT and XQuery
have a number of overlapping features. And this isn’t by accident. While the
XSLT and XQuery specifications are developed by different groups, they
are developed cooperatively and share oversight of many technologies like
XPath. This means that most operations that can be accomplished using one
processing technology can likely be accomplished using the other.
Given that XQuery and XSLT have many overlapping features, one
might assume that these technologies share a similar design, but in this, one
would be mistaken. XQuery is an expression-based language, and shares
many of the same design concepts with SQL. Its purpose is to provide a
query-based language that can process large XML datasets and databases,
overcoming one of the major weakness of the XSLT specification.
Within the context of digital libraries, however, XQuery has found only
limited use. In surveying the most popular library metadata formats, none
currently provide XQuery-based processing documents. All transformations
are currently provided as an XSLT document using version 1.0 or 2.0 of
the specification. In fact, as of this writing, probably the most high-profile
use of XQuery within the digital library community was the use of XQuery
to demonstrate the conversion of legacy MARC data to BIBFRAME 1.0,
developed by the Library of Congress. However, the XQuery approach was
7
abandoned in favor of XSLT when the Library of Congress released the 2.0
version of the BIBFRAME specification.
Metadata Crosswalking
So what is metadata crosswalking? The crosswalking of metadata is a process
in which an XML document is transformed from one schema to another.
The crosswalking process utilizes XSLT or XQuery documents to facilitate
the movement of metadata between multiple formats. This process requires
a number of decisions to be taken for how metadata elements from one
schema relate to another. Metadata crosswalks are developed by examining
the similarities and differences between differing schemas. Are there one-
to-one relationships, that is, do elements share the same meanings, or will
the data need to be interpreted? Will the conversions be lossless (unlikely)
and if not, what level of data loss will be acceptable? These decisions are
actually some of the most important in the crosswalking process, since they
will ultimately affect the quality of the final product.
So why build metadata crosswalks? If the crosswalking of metadata
will result in the loss of metadata granularity, why not just create all meta-
data in the desired format to begin with? Well, metadata crosswalking is
done for a variety of reasons, though few are as important as remote data
interoperability. Information systems today require the ability to ingest
various types of metadata from remote sources. Since most organizations
166