Page 99 - Building Digital Libraries
P. 99
CHAPTER 5
2. With all binary data formats, data corruption is always a
big concern. Even within a digital repository, one must
have systems in place to protect against the corruption
of any binary data loaded into the system. Why is this
an issue? Within a binary document, each byte retains a
special meaning. The loss or modification of one of these
bytes will invalidate the entire binary document, making
it unreadable. While XML documents are susceptible to
data corruption, the ability to correct an XML document
if data corruption does occur should give organizations
much more confidence about storing their metadata in an
open format. Consider how this relates to the above MARC
sample. As described later in chapter 6, the MARC format
utilizes fixed start positions and lengths to read field data
within the record. This information is stored within the
directory, that is, the first set of numerical bytes within the
record. As a result, the modification of any of these bytes
within the records directory, or the subtraction or addition
of bytes within the record data itself (without the recal-
culation of the records directory), will result in an invalid
or unreadable record. So, for example, by adding a single
period to the following record (highlighted), I’ve made this
MARC record unreadable, potentially losing the data stored
in the record.
Within an XML-encoded record, this type of data cor-
ruption isn’t an issue. Rather, so long as the data continues
to follow the strict XML encoding rules, the record will
always be readable.
84