Page 11 - Data Transparency White Paper_FINAL_Neat
P. 11
Addressing Biases in Multicultural & Inclusive Identity Data
11
BENCHMARK STUDY 2: TRUTHSET, 2020
Truthset is a relatively new company dedicated to measuring the
accuracy of digital record-level marketing/media data. Referenced
in the ANA’s Data Sources for Media report (2020), Truthset has
partnered with several independent providers of highly accurate and
privacy-compliant data sources. This collective data asset consists of
a mix of behavioral panels, survey data, etc. and serves as both a training set and a test set in the Truthscore™
algorithm. Truthset refers to this combined asset as the Validation Set. Truthset also has relationships with
a growing number of major third-party data providers and leverages their data, along with their Validation
Set to produce a standardized assessment of data accuracy. Identity assignments are compared across
the Validation Set and third-party data sets using hashed emails. Emails have historically provided strong
targeting opportunities – everyone logs into online accounts or subscriptions and provides an email address.
But they are personally identifiable information and must be anonymized, or “hashed” to ensure consumer
privacy. Hashed emails are anonymized identifiers representing a consumer that becomes a way of linking
their digital behaviors.
Truthset provides an estimated probability that an “assertion” about a hashed email made by a data provider
is accurate. Assertion? Profiling characteristics like age, income, gender or ethnicity and an array of other
targeting variables that are assigned to a digital data record. They cross-reference all the assertions made by
many different data providers and create the Truthscore™, the estimated probability that a particular assertion
is accurate. It helps answer the question, how confident can you be that this record is actually associated
with a Hispanic’s or Male’s or 18-34-year old’s email address?
Approach
Six providers agreed to participate in this study with AIMM. We are grateful for their commitment to
transparency and advancing the industry.
AIMM-Truthset Study Data Partners
Missing Ethnicity Data
The absolute number of hashed emails (non-PII identified records based on email addresses) available for
targeting is impressive. These six providers alone shared more than 750 million data records with Truthset
for AIMM’s study. The number of records in which race/ethnicity have been identified is also very large: just