Page 89 - Building Digital Libraries
P. 89
CHAPTER 5
and structure of a given XML document. Conceptually,
DOM breaks down an XML document as nodes in a tree.
Each tag represents a different “branch,” and attributes its
“leaves,” if you will. Within the DOM model, the entire
XML document is loaded into memory to construct the
DOM tree. As a result, the Doocument Object Model
represents a very inefficient method for accessing large
XML documents. While it makes data access easier and
more convenient through the DOM interface, it comes
at a big cost. All XML processing done utilizing XPath,
XQuery, or XSLT utilizes DOM processing, meaning that
most XML transformations are done using the DOM
architecture.
2. SAX—The Simple API for XML was initially devel-
oped as a Java-only processing method. Within the
SAX model, the XML file is read sequentially, with
events being fired as the parser enters and leaves ele-
ments within the document. Since the document is read
sequentially, it is much less memory-intensive, since only
small chunks of an XML document will be loaded at any
one time. However, unlike DOM, the parser does not
have access to any element within the XML tree, but only
the specific data that is being read at that moment.
Because of DOM’s flexibility and convenience, many XML processing tech-
nologies are built around the DOM application model, while coding libraries
and tools that break XML down into programmable “objects” tend to use a
more SAX-based model for reading and interacting with data. In chapter 7,
we will take a closer look at XSLT and how it is used to perform metadata
crosswalking between one metadata framework and another.
Early on, a number of libraries looked to the work being done by the
W3C and the XML Working Group as a method for bridging the access
gap between print and digital resources. Many of these projects continue to
play a major role in how XML-based metadata schemas are utilized today.
• The Lane Medical Library at Stanford University initiated
a project in 1998 with the purpose of exploring metadata
schemas that could better represent digital objects. Through
the Medlane project, the Stanford group developed one of
the first MARC-to-XML crosswalks and developed XML-
MARC, one of the first MARC-to-XML mapping software
tools. Using this tool, the Medlane Library was able to test
the feasibility of migrating large existing MARC databases
into XML. In many respects, the work done at Stanford
would be a precursor to the work that the Library of
Congress would eventually undertake—creating an official
MARCXML crosswalk as well as the eventual development
of the MODS metadata framework.
74