Page 114 - Building Digital Libraries
P. 114
General-Purpose Technologies Useful for Digital Repositories
frozen at version 1.0 for XSLT, and XQuery support remains completely
absent. These limitations have caused developers to look at other standards
for communication data packets between systems, and in most cases, the
standard that is being utilized is JSON.
JSON (JavaScript Object Notation) is a data exchange format for
12
passing structured and unstructured data between applications. Of course,
one can express these same relationships in XML—which might make one
wonder why developers are gravitating to JSON; and the answer would
be grounded in the support that most current-generation programming
languages provide for JSON-based content. When working with XML
documents, developers must work with the data using one of two process-
ing methods: DOM (Document Object Model) processing or SAX (Simple
API for XML) processing. Each of these models has its benefits and draw-
backs. DOM processing, for example, allows developers to interact with
XML documents as objects; that is, as elements that can have properties
and attributes. DOM also enables developers to approach the processing of
documents using XPath or accessing the data using complex data structures
like dictionaries or hashes. The challenge with DOM processing, however,
is that in order to achieve this functionality, the entire document must be
loaded and validated by the XML parser. This overhead is costly, and small
errors in the XML document will corrupt the entire data packet, rendering it
unusable. For very large XML documents, this model simply isn’t an efficient
means of processing data. And what’s more, most programming languages
provide limited DOM processing support, farming out that functionality
to external XML-parsing libraries. This means that XML support within
specific programming languages can vary greatly. SAX processing, on the
other hand, is designed for large document processing, but to achieve this
level of performance, a number of tradeoffs are made. The most significant
is that XML documents are read element by element. This means that
documents cannot be processed via XPath—and that all processing of the
document must be done by the developer. This also means that the docu-
ments hierarchy is not retained, since SAX processing is event-based; that
is, operations occur when specific tags are encountered.
So how is JSON different? The JSON file format was created to ensure
that developers could pass complex, hierarchical data to the client (web
browser), and have a standard-based, high-performance method of process-
ing the information. But it’s become so much more. For many web-based
languages like Python, Ruby, and PHP, the JSON format has become the
primary currency by which data is sent and consumed. These languages
natively consume JSON data, and in consuming it, they create rich data
objects that can then be acted upon within the application. This richness
makes JSON an ideal language to consume many of the rich metadata types
that one will find in a digital library, and given the native support for JSON
in most web-based programming languages, can be automatically gener-
ated through a couple of simple commands. This has led to the reimaging
of many library XML formats as JSON serializations. MARC is a notable
example. Take the following record output in MARCXML:
99