Page 222 - Building Digital Libraries
P. 222

Thinking about Discovery


                           given results set. However, how each tool provides this dedupli-
                           cation service varies in both technique and effectiveness. Given
                           the varied nature of data recovery from target databases, dedu-
                           plication techniques based solely on titles, dates, or authors are
                           prone to be spotty at best. Federated search developers continue
                           to research better and more normalized methods for providing
                           better deduplication of resources and ways of displaying dupli-
                           cate results.

                        Knowledge-base management
                           Before vendors’ federated search systems became mainstream,
                           a number of larger academic institutions like the University of
                           California system created and managed their own federated
                           search tools. However, many of these tools have since been
                           abandoned due to the fact that the creation and long-term
                           management of the knowledge base is expensive and time-con-
                           suming, given that available content and providers are always
                           rapidly changing. Vendor-based solutions provided a method
                           to outsource much of the technical knowledge-base manage-
                           ment to a third party. Even with this outsourcing, knowledge
                           base management still consumes a great deal of time for organi-
                           zations—making this an area that federated search vendors are
                           constantly looking to improve. Within the open source com-
                           munity, this issue is also getting attention, as researchers look-
                           ing to develop community-oriented federated search tools are
                           examining methods to create shared knowledge-base systems 5
                           in order to reduce management tasks for all users.
                        Automatic data classification
                           As hybrid discovery systems become more widely utilized, a
                           growing need for the automated classification of resources will
                           continue to develop. Given the varied nature of access points,
                           vocabulary, and classification, research relating to the normal-
                           ization and automatic classification of items based on concepts
                           is a growing field of research for federated search developers.
                           What’s more, large aggregators like VIAF provide a tremendous
                           opportunity to mine local data and enrich that content through
                           the building of linked data relationships. Tools like VIAF could
                           possibily  enable  discovery  system  developers  to  create  new
                           methods for building facets and classifying content, while at
                           the same time developing infrastructure to enable local systems
                           to close the data loop and provide reciprocal relationship infor-
                           mation as open data.

                        Ranking systems
                           All  discovery  systems  provide  some  methodology  for  doing
                           some relevance ranking of items within a result set. However,

                                                                                                                     207
   217   218   219   220   221   222   223   224   225   226   227