Page 114 - Building Digital Libraries
P. 114

General-Purpose Technologies Useful for Digital Repositories


                 frozen at version 1.0 for XSLT, and XQuery support remains completely
                 absent. These limitations have caused developers to look at other standards
                 for communication data packets between systems, and in most cases, the
                 standard that is being utilized is JSON.
                     JSON  (JavaScript Object Notation) is a data exchange format for
                          12
                 passing structured and unstructured data between applications. Of course,
                 one can express these same relationships in XML—which might make one
                 wonder why developers are gravitating to JSON; and the answer would
                 be grounded in the support that most current-generation programming
                 languages provide for JSON-based content. When working with XML
                 documents, developers must work with the data using one of two process-
                 ing methods: DOM (Document Object Model) processing or SAX (Simple
                 API for XML) processing. Each of these models has its benefits and draw-
                 backs. DOM processing, for example, allows developers to interact with
                 XML documents as objects; that is, as elements that can have properties
                 and attributes. DOM also enables developers to approach the processing of
                 documents using XPath or accessing the data using complex data structures
                 like dictionaries or hashes. The challenge with DOM processing, however,
                 is that in order to achieve this functionality, the entire document must be
                 loaded and validated by the XML parser. This overhead is costly, and small
                 errors in the XML document will corrupt the entire data packet, rendering it
                 unusable. For very large XML documents, this model simply isn’t an efficient
                 means of processing data. And what’s more, most programming languages
                 provide limited DOM processing support, farming out that functionality
                 to external XML-parsing libraries. This means that XML support within
                 specific programming languages can vary greatly. SAX processing, on the
                 other hand, is designed for large document processing, but to achieve this
                 level of performance, a number of tradeoffs are made. The most significant
                 is that XML documents are read element by element. This means that
                 documents cannot be processed via XPath—and that all processing of the
                 document must be done by the developer. This also means that the docu-
                 ments hierarchy is not retained, since SAX processing is event-based; that
                 is, operations occur when specific tags are encountered.
                     So how is JSON different? The JSON file format was created to ensure
                 that developers could pass complex, hierarchical data to the client (web
                 browser), and have a standard-based, high-performance method of process-
                 ing the information. But it’s become so much more. For many web-based
                 languages like Python, Ruby, and PHP, the JSON format has become the
                 primary currency by which data is sent and consumed. These languages
                 natively consume JSON data, and in consuming it, they create rich data
                 objects that can then be acted upon within the application. This richness
                 makes JSON an ideal language to consume many of the rich metadata types
                 that one will find in a digital library, and given the native support for JSON
                 in most web-based programming languages, can be automatically gener-
                 ated through a couple of simple commands. This has led to the reimaging
                 of many library XML formats as JSON serializations. MARC is a notable
                 example. Take the following record output in MARCXML:
                                                                                                                      99
   109   110   111   112   113   114   115   116   117   118   119