Page 75 - Building Digital Libraries
P. 75
CHAPTER 4
PREFERRED
MATERIAL TYPE FORMATS DESCRIPTION
Electronic Text 1. XML-based markup XML-based markup would be the best format for storing text-based content for long-
Electronic text includes with included DTD/ term preservation. However, these formats are also the most difficult to provide support
all text-based digital Schema files like TEI for in many of the current-generation digital library systems. Most digital library
content, including or EPUB systems currently don’t provide good support for XML-marked-up content, outside of
traditional monographic 2. PDF/UA supporting markup for metadata creation and storage. This lack of support is generally
content, serial content, 3. PDF/A related to the lack of good readers that can parse and render the marked-up content to
musical scores, theses 4. Plain-Text the user.
and dissertations, etc. In the absence of XML-based markup, the use of PDF/UA, PDF/A, or plain-text would be
the most acceptable preservation formats. PDF/UA is preferred, since it requires all data
to be UTF-8 encoded.
Imagery 1. TIFF 7 A wide range of digital image formats are potentially available, including many not
Digital imagery includes 2. JPEG2000 8 in this list. Some of those include JPEG, GIF, PSD (Photoshop), and CRW (Camera
photog raphy taken 3. PNG 9 Raw formats)—however, these are not included in the list of preferred formats due
through digital capture 4. BMP 10 to questions related to open patents or the proprietary nature of a particular image
or high-resolution scans format. While they cannot always be avoided, proprietary formats should generally
of analog content be avoided when selecting supported master preservation formats due to the limited
numbers of readers and likely limited format migration pathways.
11
Audio Files 1. WAV (at highest Like digital image formats, audio and video formats have a wide range of format
Audio files include audio level of capture) types that could potentially be used for preservation purposes. For audio captured
captured through digital 2. MP3 12 through a digital recorder, the software or capture device will likely make available a
audio capture or the raw, uncompressed, digital format. This format may or may not be in a proprietary file
conversion of analog- format. For the purposes of digital preservation, the creation of an uncompressed, WAV
based audio files. file created at the highest native resolution may provide the best format for long-term
support and access. 13
16
Video 1. DPX 14 For digitized video, AVI is currently the most commonly used container format for
Digital video, either 2. AVI 15 providing the preservation of video content. This format is well supported, and supports
captured natively or the creation of uncompressed video media. It should be noted, however, that a wide
17
18
through the digitization range of formats like MFX and the Matroska are being paired with the AVI wrapper to
of film-based media provide a more robust format (better metadata embedding) for digital audio capture.
Recommendations for natively digitized content can be a bit murkier. Organizations like
the National Archives and the Library of Congress both point to the use of DPX.
Of course, digital library managers are likely to run across a wide
range of data types that are not covered in the table above. These range
from proprietary data found in CAD drawings, to datasets in JSON or
delimited formats, to databases like MS Access or SQLite, or web pages in
WARC format. Digital library managers will need to determine what their
19
organizational capacity will be with regard to the long-term preservation
of various digital content, as well as determine if there are any limits to the
type of content that an organization can accept. And while it is unlikely
that an organization or library can provide the highest level of preservation
curation for all content, digital library managers should strive to provide
at least Level 4 preservation activities for all content managed within an
organization’s digital library.
Cloud-Based Digital Preservation Services
A question that often gets asked when developing digital libraries is whether
to use a cloud-based or a local infrastructure. While there are good reasons
why an organization may choose one infrastructure over another, there are
60