Page 48 - Building Digital Libraries
P. 48

Acquiring, Processing, Classifying, and Describing Digital Content


                 Object Requirements

                 All of these approaches have substantial advantages and disadvantages.
                 Limiting the types of resources that can be added to the repository greatly
                 simplifies workflow, storage, access, and long-term preservation. However,
                 limiting the types of resources to be acquired has the effect of excluding
                 items on the basis of administrative convenience for the library, rather than
                 on the basis of the value of the resource. On the other hand, libraries have
                 always required that certain physical prerequisites be met as a condition of
                 materials being added to the collection. Most libraries don’t accept materials
                 that are falling apart or have some other problem which means they can only
                 be preserved at a very high cost (if at all). Few libraries accept unreadable
                 and obsolete tape, cylinder, punch card, or disk media. It is common for
                 libraries to accept only materials that can be made available over the long
                 term, and there is no compelling reason to abandon this long-standing
                 practice simply because a resource is accessed by computer.



                 Transform

                 As a strategy, reformatting materials limits the types of resources that can
                 be added to a repository and offers most of the same advantages that simply
                 accepting only certain formats does. It is important to remember that digital
                 objects are inherently abstractions. When collecting, preserving, and dis-
                 seminating these objects, it is literally impossible to share the original object
                 as stored as bits on the original media. Instead, we make copies of those
                 bits at some level of abstraction. In some cases, this is simple. For example,
                 usually when word-processed files are involved, the textual content—hope-
                 fully with layout and formatting—must be kept but the bitstream itself is
                 irrelevant. This agreement is important because word-processing technol-
                 ogy constantly changes. Word processors only read a limited number of
                 versions, and software to interpret formats that were once dominant such
                 as WordStar and WordPerfect can be difficult to obtain. Fortunately, most
                 libraries concerned with archiving documents save them in a much more
                 stable format such as Unicode, Portable Document Format (PDF), or one
                 of the other common file types used by the National Archives and Record
                 Administration and described at https://www.archives.gov/preservation/
                 products/definitions/filetypes.html.
                     Transforming inevitably results in a loss of information and functional-
                 ity, and converting some resources is difficult. Simple objects consisting of
                 individual images, audio, and video files are often straightforward from an
                 ingestion and processing point of view. However, even textual documents
                 can contain embedded images, spreadsheets that contain formulas which
                 reference other documents, and other objects that are difficult to store and
                 preserve. Other types of objects—especially those associated with specific
                 platforms—often present complex challenges.




                                                                                                                      33
   43   44   45   46   47   48   49   50   51   52   53