Page 75 - Building Digital Libraries
P. 75

CHAPTER 4



                                    PREFERRED
               MATERIAL TYPE         FORMATS                                 DESCRIPTION
             Electronic Text     1.  XML-based markup   XML-based markup would be the best format for storing text-based content for long-
             Electronic text includes   with included DTD/  term preservation. However, these formats are also the most difficult to provide support
             all text-based digital   Schema files like TEI   for in many of the current-generation digital library systems. Most digital library
             content, including    or EPUB          systems currently don’t provide good support for XML-marked-up content, outside of
             traditional monographic   2. PDF/UA    supporting markup for metadata creation and storage. This lack of support is generally
             content, serial content,   3. PDF/A    related to the lack of good readers that can parse and render the marked-up content to
             musical scores, theses   4. Plain-Text  the user.
             and dissertations, etc.                In the absence of XML-based markup, the use of PDF/UA, PDF/A, or plain-text would be
                                                    the most acceptable preservation formats. PDF/UA is preferred, since it requires all data
                                                    to be UTF-8 encoded.
             Imagery             1.  TIFF 7         A wide range of digital image formats are potentially available, including many not
             Digital imagery includes   2. JPEG2000 8  in this list. Some of those include JPEG, GIF, PSD (Photoshop), and CRW (Camera
             photog raphy taken   3. PNG 9          Raw formats)—however, these are not included in the list of preferred formats due
             through digital capture   4. BMP 10    to questions related to open patents or the proprietary nature of a particular image
             or high-resolution scans               format. While they cannot always be avoided, proprietary formats should generally
             of analog content                      be avoided when selecting supported master preservation formats due to the limited
                                                    numbers of readers and likely limited format migration pathways.
                                      11
             Audio Files         1.  WAV  (at highest   Like digital image formats, audio and video formats have a wide range of format
             Audio files include audio   level of capture)  types that could potentially be used for preservation purposes. For audio captured
             captured through digital   2. MP3 12   through a digital recorder, the software or capture device will likely make available a
             audio capture or the                   raw, uncompressed, digital format. This format may or may not be in a proprietary file
             conversion of analog-                  format. For the purposes of digital preservation, the creation of an uncompressed, WAV
             based audio files.                     file created at the highest native resolution may provide the best format for long-term
                                                    support and access. 13
                                                                    16
             Video               1.  DPX 14         For digitized video, AVI  is currently the most commonly used container format for
             Digital video, either   2.  AVI 15     providing the preservation of video content. This format is well supported, and supports
             captured natively or                   the creation of uncompressed video media. It should be noted, however, that a wide
                                                                      17
                                                                                    18
             through the digitization               range of formats like MFX  and the Matroska  are being paired with the AVI wrapper to
             of film-based media                    provide a more robust format (better metadata embedding) for digital audio capture.
                                                    Recommendations for natively digitized content can be a bit murkier. Organizations like
                                                    the National Archives and the Library of Congress both point to the use of DPX.


                                                      Of course, digital library managers are likely to run across a wide
                                                   range of data types that are not covered in the table above. These range
                                                   from proprietary data found in CAD drawings, to datasets in JSON or
                                                   delimited formats, to databases like MS Access or SQLite, or web pages in
                                                   WARC  format. Digital library managers will need to determine what their
                                                         19
                                                   organizational capacity will be with regard to the long-term preservation
                                                   of various digital content, as well as determine if there are any limits to the
                                                   type of content that an organization can accept. And while it is unlikely
                                                   that an organization or library can provide the highest level of preservation
                                                   curation for all content, digital library managers should strive to provide
                                                   at least Level 4 preservation activities for all content managed within an
                                                   organization’s digital library.



                                                   Cloud-Based Digital Preservation Services

                                                   A question that often gets asked when developing digital libraries is whether
                                                   to use a cloud-based or a local infrastructure. While there are good reasons
                                                   why an organization may choose one infrastructure over another, there are
            60
   70   71   72   73   74   75   76   77   78   79   80