Page 105 - The Real Work Of Data Science Turning Data Into Information, Better Decisions, And Stronger Organizations by Ron S. Kenett, Thomas C. Redman (z-lib.org)_Neat
P. 105
98 Appendix D
New regulations, including the EU’s GDPR and the United States’ updated Common Rule
(called the Final Rule), impact the use of data about people. Both regulations relate to protecting
individuals’ rights to determine how their private information is used, in turn impacting and
affecting data collection, access, and data movement within the same country/region and between
industries. GDPR and the Final Rule try to modernize what today constitutes “private data” and
data subjects’ rights and balance them against “free flow of information between countries.”
These regulations, in areas such as health‐care management systems and social media, already
have significant impact on the work of data scientists (Shmueli 2018), and these impacts will
only grow. Finding out, after the fact, how to comply with such regulations is obviously not a
good idea.
Political campaigns are another domain where data science can have severe repercussions.
Election surveys not only provide information, they can also impact voter choices and whether
voters go to the polls. In this and other contexts, the ability to target fake news is a reality data
scientists need to face. For more on election surveys, see Kenett et al. (2018).
Further, as we described in Chapter 6, data scientists must always worry whether the data
they analyze can be trusted. These concerns are exacerbated by the access control, data
anonymization, and privacy‐preserving sharing arrangements needed to keep data private (see
Srivastiva et al. 2019). Thus, for data deemed private, data scientists may face an added layer
of complexity when it comes to trust and quality.
So what should data scientists, CAOs, and the organizations that employ them do? At the very
least, they must know and follow all relevant law. Further, we think they should do more. Nearly
a generation ago, some futurist (name unknown) opined that “privacy will be to the Information
Age what product safety was to the Industrial Age.” And in that realm, most societies opted for
greater consumer protections. Thus, we believe data scientists should strive to conduct their
work in an ethical manner, even if what that means is not always clearly spelled out.
We recommend three sources. First, Singer (2018) describes courses given in leading
universities to educate students in ethical considerations. Cornell University, for example,
introduced a data science course where students learn to deal with ethical challenges such as
biased data sets that include too few lower‐income households to be representative of the
general population. Students are also challenged to debate the use of algorithms to help auto-
mate life‐changing decisions like hiring or college admissions.
Second, a set of comprehensible ethical guidelines for statistical work was prepared by the
committee on professional ethics of the American Statistical Association (ASA 2016). The
guidelines identify six groups of stakeholders and list responsibilities of ethical statisticians.
These guidelines are very broad and also have applicability to data science.
Finally, O’Keefe and O’Brien (2018) provide an even more comprehensive perspective that
is applicable not just by data scientists but by all data professionals, everyone who touches
data, and senior management. It is a great place to start.