Page 52 - Building Digital Libraries
P. 52

Acquiring, Processing, Classifying, and Describing Digital Content


                 if it is available at all. Even if resources already contain metadata, one can-
                 not assume that it is useful.
                     Even when only textual resources are concerned, searching representa-
                 tions of objects satisfies a different need than searching objects themselves
                 via full-text search—imagine trying to find a book, article, or other resource
                 if the title, author, or subjects you wanted to use to find it weren’t separately
                 recorded and you could only use keywords against a search interface to find
                 what you needed.
                     User- or author-provided metadata is often not helpful and sometimes
                 does more harm than good. To be useful, metadata must be consistent and
                 provide access points within the context of the collection. Authors and
                 patrons are usually unaware of how their work fits in with other resources
                 or how it will be used. Consequently, they typically supply terms that reflect
                 their own interests and views rather than those of the broader user com-
                 munity that the repository exists to serve. Metadata must be consistent
                 to be useful, so if similar resources aren’t described with a similar level of
                 accuracy, specificity, and completeness, it won’t be possible to predict which
                 search strategy will be most successful. Web search-engine companies such
                 as Google have known this for a long time and ignore user-supplied meta-
                 data for that reason.
                     Providing full cataloging of electronic resources is often impractical, but it
                 is important to develop procedures that produce metadata which is consistent
                 enough in terms of quality and completeness to reliably identify resources.
                 The methods used by major search engines often prove ineffective for digital
                 repositories because the algorithms these search engines depend on only
                 work effectively with very large numbers of resources. Search engines base
                 their results on factors such as the number of linked resources, click-through
                 activity, formatting, and a number of statistical criteria. These methods are
                 highly effective when used on huge numbers of heavily used documents.
                 However, these same methods provide much less satisfactory results when
                 used on small collections containing predominantly low-use resources.




                 Structuring Content

                 Metadata can be used to organize works into virtual subcollections, but it’s
                 sometimes desirable to impose a structure on a resource that was not pres-
                 ent in the original source materials. For example, many organizations issue
                 newsletters, bulletins, regular reports, and other materials in a pattern that
                 follows a serial publication pattern. The organizations’ websites frequently
                 include a link to the current issue, and links to older issues may or may not
                 be provided. If these issues are simply treated as electronic documents, the
                 effect would be similar to creating a new record in a traditional catalog for
                 every issue. Aside from filling the catalog with enormous amounts of dupli-
                 cate data, navigating and locating specific issues become awkward, since a
                 frequently produced publication could easily be represented by hundreds of

                                                                                                                      37
   47   48   49   50   51   52   53   54   55   56   57