Page 194 - Building Digital Libraries
P. 194

Sharing Data—Harvesting, Linking, and Distribution


                 Facilitating Third-Party Indexing

                 There was a time when supporting protocols like OAI-PMH would result
                 in a higher likelihood of a digital repository being indexed by the major
                 commercial search providers. And this still may have some truth, since OAI-
                 PMH provides a structural entry point into a repository, and a documented
                 method to traverse all available content. But if this occurs, it’s more due to
                 the ability of an indexer’s crawler to traverse the OAI-PMH structure, rather
                 than to any built-in support for the format. Today, most OAI-PMH harvest-
                 ing is used by aggregators within the library or cultural heritage domains
                 to build large indexes of aggregated content, with the two largest being the
                 Digital Public Library of America (DPLA) and OCLC.
                     The DPLA utilizes OAI-PMH as the primary communication standard
                 between content providers and the aggregation of discovery and index
                 metadata related to an item. Given that OAI-PMH provides incremental
                 harvesting based on time and the number of records, it provides the mini-
                 mal functionality for the DPLA to keep metadata related to a specific collec-
                 tion current. OCLC, on the other hand, utilizes OAI-PMH as a method for
                 automatically generating MARC data for items in a digital collection. Using
                 this server, OCLC can enable users to map metadata elements harvested
                 through the OAI-PMH interface to their MARC record equivalents. The
                 process is messy and often produces very minimal records, but the process
                 does enable organizations to quickly create metadata records for inclusion
                 into OCLC’s WorldCat database, which then is made available through
                 search engines and a wide range of OCLC discovery products.
                     For search services outside of the library domain, indexing has moved
                 away from OAI-PMH to other technologies like site maps or embedded
                 linked data using formats like Schema.org. Site maps are essentially special
                 text files that provide minimal metadata about an item and a durable URL
                 that can be crawled and indexed. This simplifies the indexing process for
                 search providers, particularly when working with resources that generate a
                 lot of dynamic content or are primarily database-driven. Today, most mod-
                 ern digital library software supports this level of functionality.
                     The use of microformats, like Schema.org, enables organizations to
                 embed linked data at the meta tag level. This information is only read by the
                 indexer and is used to enrich their knowledge graphs within their systems,
                 and promote relationships between content. While the use of these formats
                 doesn’t necessarily lead to better indexing, the use does enable greater find-
                 ability, since the embedded microdata enables providers to better classify
                 content and build relationships to items that might not have otherwise been
                 obvious. For example, tagging an item about fishing on the Ohio River with
                 a geographical tag would enable the search provider to know that this image
                 may be relevant to a user in Ohio, regardless of whether that information
                 shows up anywhere within the visible metadata. This kind of linking is often
                 done outside of the library community, particularly in the business com-
                 munity, to easily surface and categorize information related to locations,
                 websites, hours of operation, and types of services provided.
                                                                                                                     179
   189   190   191   192   193   194   195   196   197   198   199