Page 4 - Big Data book
P. 4

WHAT IS SEMI- STRUCTURED DATA

                        Semi-structured data is information that does not reside in a rational

                    database but that have some organizational properties that make it easier to
                    analyze. With some process, you can store them in the relation database (it
                    could be very hard for some kind of semi-structured data), but Semi-

                    structured exist to ease space.
                        It   maintains   internal   tags   and   markings   that   identify   separate   data
                    elements,   which   enables   information   grouping   and   hierarchies.   Both
                    documents and databases can be semi-structured. This type of data only
                    represents about 5-10% of the structured/semi-structured/unstructured data
                    pie, but has critical business usage cases.
                        Email   is   a   very   common   example   of   a   semi-structured   data   type.
                    Although more advanced analysis tools are necessary for thread tracking,
                    near-dedupe,   and   concept   searching;   email’s   native   metadata   enables
                    classification and keyword searching without any additional tools.

                        Email is a huge use case, but most semi-structured development  centres
                    on easing data transport issues. Sharing sensor data is a growing use case,
                    as are Web-based data sharing and transport: electronic data interchange
                    (EDI), many social media platforms, document markup languages, and

                    NoSQL databases.
                Examples of Semi-Structured Data:

                     Markup language XML This is a semi-structured document language.
                        XML is a set of document encoding rules that defines a human- and
                        machine-readable   format.   (Although   saying   that   XML   is   human-
                        readable doesn’t pack a big punch: anyone trying to read an XML
                        document has better things to do with their time.) Its value is that its
                        tag-driven   structure   is   highly   flexible,   and   coders   can   adapt   it   to
                        universalize data structure, storage, and transport on the Web.
                     Open standard JSON (JavaScript Object Notation) JSON is another
                        semi-structured data interchange format. Java is implicit in the name but
                        other C-like programming languages recognize it. Its structure consists
                        of name/value pairs (or object, hash table, etc.) and an ordered value list
                        (or array, sequence, list). Since the structure is interchangeable among
                        languages, JSON excels at transmitting data between web applications
                        and servers.
                     NoSQL Semi-structured data is also an important element of many
                        NoSQL (“not only SQL”) databases. NoSQL databases differ from
                        relational   databases   because   they   do   not   separate   the   organization
                        (schema) from the data. This makes NoSQL a better choice to store
   1   2   3   4   5   6