Page 54 - Building Digital Libraries
P. 54

Acquiring, Processing, Classifying, and Describing Digital Content


                     As metadata becomes more detailed and complete, the possibilities for
                 searching and organizing a collection increase. However, creating elaborate
                 metadata is expensive and time-consuming, and it is also prone to errors
                 and omissions. Metadata must be applied consistently to be useful, or sys-
                 tems will either ignore it or normalize it to a simpler form. For example,
                 in addition to the cryptic fixed-length fields described above, the MARC
                 standard for bibliographic records defines literally dozens of variable-length
                 note fields. Despite the fact that notes in the catalog record are stored as
                 free text, different MARC fields are used for notes depending on whether
                 they concern bibliographical resources, summaries, translations, repro-
                 ductions, access restrictions, file types, dissertations, or a number of other
                 things notes may be written about. Each MARC field is associated with a
                 separate numeric tag and may contain a myriad of subfields. Guidelines
                 for inputting notes exist, but the notes themselves vary considerably in
                 terms of structure and completeness. This should not be surprising, given
                 that cataloging rules change over time, practices vary from one library to
                 the next, and notes consist of free text. To compensate for the variability
                 of how notes are input, the vast majority of systems treat almost all note
                 fields identically. In other words, catalogers at many institutions spend
                 countless hours encoding information that will never be used. The notes
                 may be useful, but the specific fields, indicators, and tags are not. The lesson
                 to be learned is that repositories should only require metadata that can be
                 entered consistently.
                     Consistent metadata structures are essential, but it is also important to
                 ensure that the contents of various metadata fields are normalized as much
                 as possible. This means that when subjects, names, organizations, places,
                 or other entities are associated with a resource, those who assign metadata
                 should select from an authorized list of preferred terms rather than type
                 in free-text entries. A way to add or suggest new preferred terms should
                 be available, and new terms should be curated before being added to the
                 authorized list.
                     To help users, metadata must categorize resources, and categorizing
                 requires entering these resources consistently. If the documents created by
                 an author named James Smith appear in the repository under “Jim Smith;”
                 “Smith, James;” “J. T. Smith;” “Smith, James T.;” “Smith, J.”; and a number
                 of other variations, finding documents that he authored and distinguishing
                 him from other authors with similar names will be difficult. Likewise, if
                 subjects are not entered consistently, documents about the same topic will
                 be assigned different subject headings—this makes it significantly harder
                 to find materials about a resource or topic. Authority control can seem slow
                 and expensive, but it is worth the trouble. Without it, a database will fill up
                 with inconsistent and unreliable entries.
                     When determining how to use metadata to organize resources, reposi-
                 tory planners should take reasonable steps to ensure that the metadata is
                 compatible with that used in other collections that users will likely want to
                 search. There are so many sources of information that it is unreasonable

                                                                                                                      39
   49   50   51   52   53   54   55   56   57   58   59