Page 89 - Building Digital Libraries
P. 89

CHAPTER 5


                                                             and structure of a given XML document. Conceptually,
                                                             DOM breaks down an XML document as nodes in a tree.
                                                             Each tag represents a different “branch,” and attributes its
                                                             “leaves,” if you will. Within the DOM model, the entire
                                                             XML document is loaded into memory to construct the
                                                             DOM tree. As a result, the Doocument Object Model
                                                             represents a very inefficient method for accessing large
                                                             XML documents. While it makes data access easier and
                                                             more convenient through the DOM interface, it comes
                                                             at a big cost. All XML processing done utilizing XPath,
                                                             XQuery, or XSLT utilizes DOM processing, meaning that
                                                             most XML transformations are done using the DOM
                                                             architecture.
                                                          2.  SAX—The Simple API for XML was initially devel-
                                                             oped as a Java-only processing method. Within the
                                                             SAX model, the XML file is read sequentially, with
                                                             events being fired as the parser enters and leaves ele-
                                                             ments within the document. Since the document is read
                                                             sequentially, it is much less memory-intensive, since only
                                                             small chunks of an XML document will be loaded at any
                                                             one time. However, unlike DOM, the parser does not
                                                             have access to any element within the XML tree, but only
                                                             the specific data that is being read at that moment.
                                                   Because of DOM’s flexibility and convenience, many XML processing tech-
                                                   nologies are built around the DOM application model, while coding libraries
                                                   and tools that break XML down into programmable “objects” tend to use a
                                                   more SAX-based model for reading and interacting with data. In chapter 7,
                                                   we will take a closer look at XSLT and how it is used to perform metadata
                                                   crosswalking between one metadata framework and another.
                                                      Early on, a number of libraries looked to the work being done by the
                                                   W3C and the XML Working Group as a method for bridging the access
                                                   gap between print and digital resources. Many of these projects continue to
                                                   play a major role in how XML-based metadata schemas are utilized today.
                                                          •	 The Lane Medical Library at Stanford University initiated
                                                             a project in 1998 with the purpose of exploring metadata
                                                             schemas that could better represent digital objects. Through
                                                             the Medlane project, the Stanford group developed one of
                                                             the first MARC-to-XML crosswalks and developed XML-
                                                             MARC, one of the first MARC-to-XML mapping software
                                                             tools. Using this tool, the Medlane Library was able to test
                                                             the feasibility of migrating large existing MARC databases
                                                             into XML. In many respects, the work done at Stanford
                                                             would be a precursor to the work that the Library of
                                                             Congress would eventually undertake—creating an official
                                                             MARCXML crosswalk as well as the eventual development
                                                             of the MODS metadata framework.
            74
   84   85   86   87   88   89   90   91   92   93   94