Page 185 - Building Digital Libraries
P. 185
CHAPTER 7
can prove to be a challenge, since much of the context and granularity is
lost through the process.
OAI-PMH
Once an item has made it into a digital repository, how is it to be shared?
Contributors likely want their work to reach the broadest audience, while
digital repository administrators want to expose data in a way that will
maximize its exposure at a relatively low cost. Can the repository be crawled
by search engines, and can the metadata be accessed by remote systems?
Within our shared information climate, digital repository software must be
able to provide a straightforward method for sharing metadata about the
items that it houses.
Fortunately, such a method exists in all major digital repository services.
OAI-PMH (Open Archives Imitative Protocol for Metadata Harvesting) is a
simple HTTP-based protocol that can be used to make a digital repository’s
metadata available for harvest. The protocol works over a normal HTTP Get
request—allowing metadata to be harvested by the construction of a simple
URL. For example, the following URL, http://kb.osu.edu/oai/request?verb=
ListRecords&set=hdl_1811_29375&metadataPrefix=oai_dc, will harvest all
metadata items from OSUL’s 2006–07 Mershon Center Research Projects
(Use of Force and Diplomacy) collection in the libraries’ institutional reposi-
tory. The protocol utilizes a limited set of verbs, limiting its functionality
primarily to metadata harvesting and the querying of information about a
specific collection or collections on the server. To simplify the OAI-PMH
harvesting process, the protocol requires the support of Unqualified Dublin
Core. This is what is known as the compatibility schema, so no matter what
OAI-PMH repository one harvests from, one can be guaranteed that the
metadata will be available in Dublin Core. However, this doesn’t prevent
an OAI-PMH repository from supporting other metadata formats. In fact,
quite the contrary. OAI-PMH implementers are encouraged to support
multiple metadata formats, so that the repositories’ metadata can be pro-
vided in various levels of granularity. In the OSUL institutional repository,
for example, two metadata formats are supported for harvest: Unqualified
Dublin Core and RDF.
The OAI-PMH protocol recognizes five actions, or requests, that can
be made to an OAI-PMH server. Attached to these actions is a limited set
of arguments that can be set to limit the range of data to be harvested by
date or set, as well as request the harvested metadata in a specific schema.
Harvesting limits are set primarily by identifying a range of dates using the
“from” and “until” OAI-PMH arguments. Within the OAI-PMH server, date
ranges limit the OAI-PMH response to items whose metadata time stamp
has been modified within the specified date range. The “from” and “until”
argument can be used as pairs or separately to selectively harvest metadata
from an OAI-PMH repository. Additional arguments that can be found in
170