Page 121 - Building Digital Libraries
P. 121

CHAPTER 5


                                                          rubymarc (https://github.com/ruby-marc/ruby-marc/): rubymarc
                                                              is a pure ruby library developed to support the creation, mani-
                                                              pulation, and processing of MARC21 and MARCXML data.
                                                          Saxon (https://sourceforge.net/projects/saxon/): Saxon is a high-
                                                              performance XSLT/XQuery-processing toolkit that can be
                                                              run as a library or stand-alone application. Many would argue
                                                              that it is the gold standard of XML/XSLT/XQuery processing
                                                              tools, since the creator, Michael Kay, plays a key role on the
                                                              advisory committee overseeing the development of the XSLT
                                                              and XQuery standards.
                                                          nokogiri (www.nokogiri.org/): nokogiri is a high-performance ruby
                                                              library created to provide XML functionality to the language.
                                                              While Ruby does provide a core set of XML functionality,
                                                              the overall performance of its built-in language tools makes
                                                              it nearly unusable for data manipulation purposes. Nokogiri
                                                              fills this niche and is used throughout the digital library
                                                              community, particularly in projects like the Samvera project.
                                                          Catmandu (http://librecat.org/): Catmandu is a set of command-
                                                              line and PERL tools that provide a wide range of data
                                                              manipulation functionality for dealing with many of the data
                                                              formats found in libraries.



                                                   Software Tools

                                                   While the formal tool development for the cultural heritage community is
                                                   fairly sparse, there are a handful of tools that are nearly universally part of
                                                   every metadata or digital library manager’s toolkit.

                                                          OpenRefine  (http://openrefine.org/): It’s hard to describe just
                                                              how powerful OpenRefine can be as a data manipulation
                                                              tool. When a user first comes across it, OpenRefine looks like
                                                              a spreadsheet program, on steroids. But in reality, it’s much
                                                              more than that. OpenRefine excels in providing structure
                                                              and meaning to unstructured data. It includes its own macro
                                                              language and robust regular expression language; and it has a
                                                              wide range of plug-ins and flavors that add additional support
                                                              for data reconciliation with linked data services.
                                                          Oxygen XML Editor (www.oxygenxml.com/): There are a lot of
                                                              XML editors available for download and use, but none are
                                                              better than Oxygen. Unlike the other tools on this list, Oxygen
                                                              isn’t free to use; it is a proprietary application that carries a
                                                              relatively high price tag, but its ability to create, test, and
                                                              model data and data transformations makes it worth noting.
                                                          Yaz  (www.indexdata.com/yaz): The Yaz toolkit is ubiquitous;
                                                              it powers nearly all of the  available Z39.50  servers  on  the
                                                              Internet and is available as programming libraries for nearly
            106
   116   117   118   119   120   121   122   123   124   125   126