Page 100 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 100
Appendix B
Data Defined
There are many approaches to defining data. Here we use the one that best corresponds to the
way data is created and used in organizations and, in our view, best supports data science
(Redman 2008). Thus, data consists of a data model and a data value.
Data models are abstractions of the real world that contextually define what the data are
all about, including specifications of things of interest (called “entities”), important properties
of those things (fields or attributes), and relationships between them. Thus you, the reader,
are an entity, and your employer is interested in you as an EMPLOYEE, with attributes such
as DEPARTMENT, SALARY, and MANAGER. REPORTS TO is an example of a relation-
ship. You are not just an EMPLOYEE, but also a TAXPAYER, a PATIENT, and a USER,
created by the tax agency, your medical provider, and tech companies with their own inter-
ests in mind.
As mentioned in Chapter 1, data are not just numbers – data exists in the context of a data
model and the purposes of the organization that defined that model. The model exists in
other contexts as well, including who created it and for what reason. And putting data in its
proper context with respect to any given analysis is critical for data scientists.
Today, there is much interest in unstructured data, which we prefer to think of as data that
has not yet been structured.
We use the term metadata to refer to data that makes other data easier to use. Data models,
data definitions, and business rules (that constrain data values) all qualify. It bears mention
that data that has been computerized is often more useful, though the definition carries no such
requirement. We also use the term soft data to refer to sights, smells, sounds, impressions,
feelings, conversations, unstructured data, and the like that are not necessarily hard data but
are relevant to the analysis or decision at hand. When smells are measured with electronic
noses, they become hard data. When social media such as tweets is subjected to sentiment
analysis using text analytics, like smells, it moves from soft data to hard data.
There are also many approaches to defining information. We find it most powerful to define
information not in terms of what it is but in terms of what it does. To illustrate, suppose you
are playing a game of chance with one die. You bet a dollar and select a number, 1–6.
A “dealer” then rolls a die, and you either lose your bet or are paid six. You don’t get to see
The Real Work of Data Science: Turning Data into Information, Better Decisions, and Stronger Organizations,
First Edition. Ron S. Kenett and Thomas C. Redman.
© 2019 Ron S. Kenett and Thomas C. Redman. Published 2019 by John Wiley & Sons Ltd.
Companion website: www.wiley.com/go/kenett-redman/datascience

