Page 54 - Building Digital Libraries
P. 54
Acquiring, Processing, Classifying, and Describing Digital Content
As metadata becomes more detailed and complete, the possibilities for
searching and organizing a collection increase. However, creating elaborate
metadata is expensive and time-consuming, and it is also prone to errors
and omissions. Metadata must be applied consistently to be useful, or sys-
tems will either ignore it or normalize it to a simpler form. For example,
in addition to the cryptic fixed-length fields described above, the MARC
standard for bibliographic records defines literally dozens of variable-length
note fields. Despite the fact that notes in the catalog record are stored as
free text, different MARC fields are used for notes depending on whether
they concern bibliographical resources, summaries, translations, repro-
ductions, access restrictions, file types, dissertations, or a number of other
things notes may be written about. Each MARC field is associated with a
separate numeric tag and may contain a myriad of subfields. Guidelines
for inputting notes exist, but the notes themselves vary considerably in
terms of structure and completeness. This should not be surprising, given
that cataloging rules change over time, practices vary from one library to
the next, and notes consist of free text. To compensate for the variability
of how notes are input, the vast majority of systems treat almost all note
fields identically. In other words, catalogers at many institutions spend
countless hours encoding information that will never be used. The notes
may be useful, but the specific fields, indicators, and tags are not. The lesson
to be learned is that repositories should only require metadata that can be
entered consistently.
Consistent metadata structures are essential, but it is also important to
ensure that the contents of various metadata fields are normalized as much
as possible. This means that when subjects, names, organizations, places,
or other entities are associated with a resource, those who assign metadata
should select from an authorized list of preferred terms rather than type
in free-text entries. A way to add or suggest new preferred terms should
be available, and new terms should be curated before being added to the
authorized list.
To help users, metadata must categorize resources, and categorizing
requires entering these resources consistently. If the documents created by
an author named James Smith appear in the repository under “Jim Smith;”
“Smith, James;” “J. T. Smith;” “Smith, James T.;” “Smith, J.”; and a number
of other variations, finding documents that he authored and distinguishing
him from other authors with similar names will be difficult. Likewise, if
subjects are not entered consistently, documents about the same topic will
be assigned different subject headings—this makes it significantly harder
to find materials about a resource or topic. Authority control can seem slow
and expensive, but it is worth the trouble. Without it, a database will fill up
with inconsistent and unreliable entries.
When determining how to use metadata to organize resources, reposi-
tory planners should take reasonable steps to ensure that the metadata is
compatible with that used in other collections that users will likely want to
search. There are so many sources of information that it is unreasonable
39