Page 19 - Data Transparency White Paper_FINAL_Neat
P. 19
Addressing Biases in Multicultural & Inclusive Identity Data
19
QUESTION ISSUE BEST PRACTICES
Four approaches are considered
best practices for validating
multicultural identity assignments:
1. Cross syndicated source
verification (e.g., MRI-Simmons
with self-identified individuals)
There are different ways providers
define race/ethnicity, such as first 2. “Truth” dataset comparison
name, surname, country of origin, (e.g., client first-party data with
English proficiency, U.S. Census known, self-identified individuals
How Should definitions, neighborhood, as well as and attributes from
Multicultural Data expert AI systems and algorithms. a representative source)
be Validated? Benchmarking has shown substantial 3. For modeled segments,
differences in data coverage and comparison to holdout samples of
accuracy generated by the different self-identified individuals
methods. All methods should be
validated routinely. 4. Audit from independent third-
party sources (e.g., Neutronian,
Truthset, or providers that can
validate with self-report intercept
studies, such as Jolt or Lucid)
In all cases, the standard of
accuracy is self-report.
The validation study will reveal how
good the data is, but how good is
good enough? The need for accuracy To be considered a Hispanic,
and coverage varies with the use African-American, Asian-American,
case. Benchmarking has shown that or other multicultural segment, at
How Accurate it is reasonable, for broadly defined least 67 percent of records in the
Should Multicultural cultural identities, to expect accuracy segment must be accurate and
Data Be? of at least 67 percent. With this in verified as that target. 67 percent
is the minimum concentration of
mind, AIMM recommends a minimum multicultural consumers/records
accuracy rate of 67 percent. Higher is within a segment necessary to be
better. We expect this low bar to be called that particular segment.
raised over time as industry
practices improve.
A marketer’s need for reach often • Providers should disclose details
requires that a data-based target about the underlying base data,
Can the Accuracy audience segment be extended how the match process works,
and Coverage of through modeling. Validation studies and match rates.
Modeled Audiences
Be Validated? will reveal accuracy and coverage • Validation studies should reveal
trade-offs between probabilistic and the coverage and accuracy of
deterministic approaches. modeled segments.