Page 11 - Data Transparency White Paper_FINAL_Neat
P. 11

Addressing Biases in Multicultural & Inclusive Identity Data
                                                                                                                     11








              BENCHMARK STUDY 2: TRUTHSET, 2020


                                                  Truthset is a relatively new company dedicated to measuring the
                                                  accuracy of digital record-level marketing/media data. Referenced
                                                  in the ANA’s Data Sources for Media report (2020), Truthset has
                                                  partnered with several independent providers of highly accurate and
                                                  privacy-compliant data sources. This collective data asset consists of
              a mix of behavioral panels, survey data, etc. and serves as both a training set and a test set in the Truthscore™
              algorithm. Truthset refers to this combined asset as the Validation Set. Truthset also has relationships with
              a growing number of major third-party data providers and leverages their data, along with their Validation
              Set to produce a standardized assessment of data accuracy. Identity assignments are compared across
              the Validation Set and third-party data sets using hashed emails.  Emails have historically provided strong
              targeting opportunities – everyone logs into online accounts or subscriptions and provides an email address.
              But they are personally identifiable information and must be anonymized, or “hashed” to ensure consumer
              privacy.   Hashed emails are anonymized identifiers representing a consumer that becomes a way of linking
              their digital behaviors.

              Truthset provides an estimated probability that an “assertion” about a hashed email made by a data provider
              is accurate.  Assertion? Profiling characteristics like age, income, gender or ethnicity and an array of other
              targeting variables that are assigned to a digital data record.  They cross-reference all the assertions made by
              many different data providers and create the Truthscore™, the estimated probability that a particular assertion
              is accurate.  It helps answer the question, how confident can you be that this record is actually associated
              with a Hispanic’s or Male’s or 18-34-year old’s email address?


              Approach
              Six providers agreed to participate in this study with AIMM. We are grateful for their commitment to
              transparency and advancing the industry.


                                          AIMM-Truthset Study Data Partners
















              Missing Ethnicity Data
              The absolute number of hashed emails (non-PII identified records based on email addresses) available for
              targeting is impressive. These six providers alone shared more than 750 million data records with Truthset
              for AIMM’s study. The number of records in which race/ethnicity have been identified is also very large: just
   6   7   8   9   10   11   12   13   14   15   16