Page 52 - Building Digital Libraries
P. 52
Acquiring, Processing, Classifying, and Describing Digital Content
if it is available at all. Even if resources already contain metadata, one can-
not assume that it is useful.
Even when only textual resources are concerned, searching representa-
tions of objects satisfies a different need than searching objects themselves
via full-text search—imagine trying to find a book, article, or other resource
if the title, author, or subjects you wanted to use to find it weren’t separately
recorded and you could only use keywords against a search interface to find
what you needed.
User- or author-provided metadata is often not helpful and sometimes
does more harm than good. To be useful, metadata must be consistent and
provide access points within the context of the collection. Authors and
patrons are usually unaware of how their work fits in with other resources
or how it will be used. Consequently, they typically supply terms that reflect
their own interests and views rather than those of the broader user com-
munity that the repository exists to serve. Metadata must be consistent
to be useful, so if similar resources aren’t described with a similar level of
accuracy, specificity, and completeness, it won’t be possible to predict which
search strategy will be most successful. Web search-engine companies such
as Google have known this for a long time and ignore user-supplied meta-
data for that reason.
Providing full cataloging of electronic resources is often impractical, but it
is important to develop procedures that produce metadata which is consistent
enough in terms of quality and completeness to reliably identify resources.
The methods used by major search engines often prove ineffective for digital
repositories because the algorithms these search engines depend on only
work effectively with very large numbers of resources. Search engines base
their results on factors such as the number of linked resources, click-through
activity, formatting, and a number of statistical criteria. These methods are
highly effective when used on huge numbers of heavily used documents.
However, these same methods provide much less satisfactory results when
used on small collections containing predominantly low-use resources.
Structuring Content
Metadata can be used to organize works into virtual subcollections, but it’s
sometimes desirable to impose a structure on a resource that was not pres-
ent in the original source materials. For example, many organizations issue
newsletters, bulletins, regular reports, and other materials in a pattern that
follows a serial publication pattern. The organizations’ websites frequently
include a link to the current issue, and links to older issues may or may not
be provided. If these issues are simply treated as electronic documents, the
effect would be similar to creating a new record in a traditional catalog for
every issue. Aside from filling the catalog with enormous amounts of dupli-
cate data, navigating and locating specific issues become awkward, since a
frequently produced publication could easily be represented by hundreds of
37