Page 160 - Building Digital Libraries

P. 160

Metadata Formats

uncommon for a library to maintain a documents repository like DSpace
or BePress, and a repository for other digital content, like images and vid-
eos—since the metadata decisions made to support one type of content
often didn’t translate well to the others. But this is changing and changing
quickly. Tools like Fedora, and communities like the Samvera community,
are shifting the bibliographic data model from one where users must select
a single metadata framework, to one where we can utilize semantic web
principles and make use of multiple metadata namespaces to provide the
best support for our digital objects. This flexibility is allowing libraries to
think more holistically about the type of metadata frameworks that they
utilize, and choose elements from a wider range of communities that best
support the data model for their content. In addition, libraries may find
that digital repositories which support semantic principles may have easier
paths when considering discovery, data interoperability, and migrations.
But can we see this today?
The answer, at least as it relates to data interoperability, is that we largely
can’t. Data interoperability between formats and communities continues
to be governed primarily through the use of data crosswalks to normalize
the metadata from one community into a format that can be understood
by another. With that said, the use of semantic principles or formats like
the schema.org are moving quickly to provide a set of “common language”
elements that can be used to allow communities to cross barriers. Will these
common languages be as robust as older data crosswalks? Likely not. Most
data crosswalks provide one-to-one translations of a system, but in many
cases, data interoperability doesn’t require strict data mapping, but rather
mapping that is good enough to provide enough context to support search
and discovery, creating a framework that will allow machines to understand
the relationships between interconnected data.
Browsing the Web has become second nature for most individuals—but
even new users with very little experience working on the Web are able to
quickly view and make decisions regarding the content found there. When
browsing web content, human beings are easily able to understand the dif-
ference between advertisements and content—giving people the ability to
unconsciously filter the advertisement out of their mind’s eye. Likewise,
when one considers library metadata, a cataloger with any experience can
quickly determine the primary control number found within a MARC
record, allowing the cataloger to interpret not only the metadata record, but
the rules necessary to place that metadata into alternative formats. Machines
simply do not have this ability at this point in time. Automated machine
processes require the presence of rules and schemas to identify for the soft-
ware application the relationships that exist between data. Considering these
two examples, a machine would have a very difficult time distinguishing an
advertisement from content simply by examining the content. In part, this is
why the pop-up blockers and advertisement scrubbers that can be found in
web browsers today work primarily through the use of blacklists and known
advertising content providers to determine how the elements of a document

145

155 156 157 158 159 160 161 162 163 164 165