Page 48 - Building Digital Libraries
P. 48
Acquiring, Processing, Classifying, and Describing Digital Content
Object Requirements
All of these approaches have substantial advantages and disadvantages.
Limiting the types of resources that can be added to the repository greatly
simplifies workflow, storage, access, and long-term preservation. However,
limiting the types of resources to be acquired has the effect of excluding
items on the basis of administrative convenience for the library, rather than
on the basis of the value of the resource. On the other hand, libraries have
always required that certain physical prerequisites be met as a condition of
materials being added to the collection. Most libraries don’t accept materials
that are falling apart or have some other problem which means they can only
be preserved at a very high cost (if at all). Few libraries accept unreadable
and obsolete tape, cylinder, punch card, or disk media. It is common for
libraries to accept only materials that can be made available over the long
term, and there is no compelling reason to abandon this long-standing
practice simply because a resource is accessed by computer.
Transform
As a strategy, reformatting materials limits the types of resources that can
be added to a repository and offers most of the same advantages that simply
accepting only certain formats does. It is important to remember that digital
objects are inherently abstractions. When collecting, preserving, and dis-
seminating these objects, it is literally impossible to share the original object
as stored as bits on the original media. Instead, we make copies of those
bits at some level of abstraction. In some cases, this is simple. For example,
usually when word-processed files are involved, the textual content—hope-
fully with layout and formatting—must be kept but the bitstream itself is
irrelevant. This agreement is important because word-processing technol-
ogy constantly changes. Word processors only read a limited number of
versions, and software to interpret formats that were once dominant such
as WordStar and WordPerfect can be difficult to obtain. Fortunately, most
libraries concerned with archiving documents save them in a much more
stable format such as Unicode, Portable Document Format (PDF), or one
of the other common file types used by the National Archives and Record
Administration and described at https://www.archives.gov/preservation/
products/definitions/filetypes.html.
Transforming inevitably results in a loss of information and functional-
ity, and converting some resources is difficult. Simple objects consisting of
individual images, audio, and video files are often straightforward from an
ingestion and processing point of view. However, even textual documents
can contain embedded images, spreadsheets that contain formulas which
reference other documents, and other objects that are difficult to store and
preserve. Other types of objects—especially those associated with specific
platforms—often present complex challenges.
33