Page 222 - Building Digital Libraries
P. 222
Thinking about Discovery
given results set. However, how each tool provides this dedupli-
cation service varies in both technique and effectiveness. Given
the varied nature of data recovery from target databases, dedu-
plication techniques based solely on titles, dates, or authors are
prone to be spotty at best. Federated search developers continue
to research better and more normalized methods for providing
better deduplication of resources and ways of displaying dupli-
cate results.
Knowledge-base management
Before vendors’ federated search systems became mainstream,
a number of larger academic institutions like the University of
California system created and managed their own federated
search tools. However, many of these tools have since been
abandoned due to the fact that the creation and long-term
management of the knowledge base is expensive and time-con-
suming, given that available content and providers are always
rapidly changing. Vendor-based solutions provided a method
to outsource much of the technical knowledge-base manage-
ment to a third party. Even with this outsourcing, knowledge
base management still consumes a great deal of time for organi-
zations—making this an area that federated search vendors are
constantly looking to improve. Within the open source com-
munity, this issue is also getting attention, as researchers look-
ing to develop community-oriented federated search tools are
examining methods to create shared knowledge-base systems 5
in order to reduce management tasks for all users.
Automatic data classification
As hybrid discovery systems become more widely utilized, a
growing need for the automated classification of resources will
continue to develop. Given the varied nature of access points,
vocabulary, and classification, research relating to the normal-
ization and automatic classification of items based on concepts
is a growing field of research for federated search developers.
What’s more, large aggregators like VIAF provide a tremendous
opportunity to mine local data and enrich that content through
the building of linked data relationships. Tools like VIAF could
possibily enable discovery system developers to create new
methods for building facets and classifying content, while at
the same time developing infrastructure to enable local systems
to close the data loop and provide reciprocal relationship infor-
mation as open data.
Ranking systems
All discovery systems provide some methodology for doing
some relevance ranking of items within a result set. However,
207