Page 121 - Building Digital Libraries
P. 121
CHAPTER 5
rubymarc (https://github.com/ruby-marc/ruby-marc/): rubymarc
is a pure ruby library developed to support the creation, mani-
pulation, and processing of MARC21 and MARCXML data.
Saxon (https://sourceforge.net/projects/saxon/): Saxon is a high-
performance XSLT/XQuery-processing toolkit that can be
run as a library or stand-alone application. Many would argue
that it is the gold standard of XML/XSLT/XQuery processing
tools, since the creator, Michael Kay, plays a key role on the
advisory committee overseeing the development of the XSLT
and XQuery standards.
nokogiri (www.nokogiri.org/): nokogiri is a high-performance ruby
library created to provide XML functionality to the language.
While Ruby does provide a core set of XML functionality,
the overall performance of its built-in language tools makes
it nearly unusable for data manipulation purposes. Nokogiri
fills this niche and is used throughout the digital library
community, particularly in projects like the Samvera project.
Catmandu (http://librecat.org/): Catmandu is a set of command-
line and PERL tools that provide a wide range of data
manipulation functionality for dealing with many of the data
formats found in libraries.
Software Tools
While the formal tool development for the cultural heritage community is
fairly sparse, there are a handful of tools that are nearly universally part of
every metadata or digital library manager’s toolkit.
OpenRefine (http://openrefine.org/): It’s hard to describe just
how powerful OpenRefine can be as a data manipulation
tool. When a user first comes across it, OpenRefine looks like
a spreadsheet program, on steroids. But in reality, it’s much
more than that. OpenRefine excels in providing structure
and meaning to unstructured data. It includes its own macro
language and robust regular expression language; and it has a
wide range of plug-ins and flavors that add additional support
for data reconciliation with linked data services.
Oxygen XML Editor (www.oxygenxml.com/): There are a lot of
XML editors available for download and use, but none are
better than Oxygen. Unlike the other tools on this list, Oxygen
isn’t free to use; it is a proprietary application that carries a
relatively high price tag, but its ability to create, test, and
model data and data transformations makes it worth noting.
Yaz (www.indexdata.com/yaz): The Yaz toolkit is ubiquitous;
it powers nearly all of the available Z39.50 servers on the
Internet and is available as programming libraries for nearly
106