Page 100 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 100

Appendix B







             Data Defined






             There are many approaches to defining data. Here we use the one that best corresponds to the
             way data is created and used in organizations and, in our view, best supports data science
             (Redman 2008). Thus, data consists of a data model and a data value.
               Data models are abstractions of the real world that contextually define what the data are
             all about, including specifications of things of interest (called “entities”), important properties
             of those things (fields or attributes), and relationships between them. Thus you, the reader,
             are an entity, and your employer is interested in you as an EMPLOYEE, with attributes such
             as DEPARTMENT, SALARY, and MANAGER. REPORTS TO is an example of a relation-
             ship. You are not just an EMPLOYEE, but also a TAXPAYER, a PATIENT, and a USER,
             created by the tax agency, your medical provider, and tech companies with their own inter-
             ests in mind.
               As mentioned in Chapter 1, data are not just numbers – data exists in the context of a data
             model and the purposes of the organization that defined that model. The model exists in
             other contexts as well, including who created it and for what reason. And putting data in its
             proper context with respect to any given analysis is critical for data scientists.
               Today, there is much interest in unstructured data, which we prefer to think of as data that
             has not yet been structured.
               We use the term metadata to refer to data that makes other data easier to use. Data models,
             data definitions, and business rules (that constrain data values) all qualify. It bears mention
             that data that has been computerized is often more useful, though the definition carries no such
             requirement. We also use the term soft data to refer to sights, smells, sounds, impressions,
             feelings, conversations, unstructured data, and the like that are not necessarily hard data but
             are relevant to the analysis or decision at hand. When smells are measured with electronic
             noses, they become hard data. When social media such as tweets is subjected to sentiment
             analysis using text analytics, like smells, it moves from soft data to hard data.
               There are also many approaches to defining information. We find it most powerful to define
             information not in terms of what it is but in terms of what it does. To illustrate, suppose you
             are playing a game of chance with one die. You bet a dollar and select a number, 1–6.
             A “dealer” then rolls a die, and you either lose your bet or are paid six. You don’t get to see



             The Real Work of Data Science: Turning Data into Information, Better Decisions, and Stronger Organizations,
             First Edition. Ron S. Kenett and Thomas C. Redman.
             © 2019 Ron S. Kenett and Thomas C. Redman. Published 2019 by John Wiley & Sons Ltd.
             Companion website: www.wiley.com/go/kenett-redman/datascience
   95   96   97   98   99   100   101   102   103   104   105