Page 99 - Building Digital Libraries
P. 99

CHAPTER 5


                                                       2.  With all binary data formats, data corruption is always a
                                                          big concern. Even within a digital repository, one must
                                                          have systems in place to protect against the corruption
                                                          of any binary data loaded into the system. Why is this
                                                          an issue? Within a binary document, each byte retains a
                                                          special meaning. The loss or modification of one of these
                                                          bytes will invalidate the entire binary document, making
                                                          it unreadable. While XML documents are susceptible to
                                                          data corruption, the ability to correct an XML document
                                                          if data corruption does occur should give organizations
                                                          much more confidence about storing their metadata in an
                                                          open format. Consider how this relates to the above MARC
                                                          sample. As described later in chapter 6, the MARC format
                                                          utilizes fixed start positions and lengths to read field data
                                                          within the record. This information is stored within the
                                                          directory, that is, the first set of numerical bytes within the
                                                          record. As a result, the modification of any of these bytes
                                                          within the records directory, or the subtraction or addition
                                                          of bytes within the record data itself (without the recal-
                                                          culation of the records directory), will result in an invalid
                                                          or unreadable record. So, for example, by adding a single
                                                          period to the following record (highlighted), I’ve made this
                                                          MARC record unreadable, potentially losing the data stored
                                                          in the record.
                                                              Within an XML-encoded record, this type of data cor-
                                                          ruption isn’t an issue. Rather, so long as the data continues
                                                          to follow the strict XML encoding rules, the record will
                                                          always be readable.

































            84
   94   95   96   97   98   99   100   101   102   103   104