Page 23 - Understandinging Forensic Technology Landscape
P. 23
Text analytics and technology-
assisted review (TAR) tools
Text analytics and technology-assisted There are two general approaches to TAR. The original
19
review (TAR) in the forensic context version is known as predictive coding, a type of AI that
used machine learning to predict which documents are
Text analytics — also referred to as text mining or text more likely to contain responsive content. In predictive
data mining — is the process of deriving high-quality coding, a group of human reviewers initially code or tag
information from text. Simple text analytics include a group of documents. The reviewers then load that
searching and then extraction from keywords and “seed set” of tagged content onto a computer running
other strings. More complex text analytics include TAR software. The computer analyses the seed set
approaches that focus on analysing patterns and trends and learns from it, which documents should be labeled
through means such as statistical pattern learning. with which codes. In predictive coding, the quality of
It also describes a set of linguistic, statistical, and the results depends heavily on the quality of the original
machine learning techniques that model and structure seed set. If that seed is sloppily coded or incomplete, the
the information content of textual sources for legal computer’s results are similarly flawed.
document analysis, business intelligence, exploratory
data analysis, research, or investigation. The next generation of TAR uses what’s known
as continuous active learning. With this advanced
Text analytics is a central component of technology- TAR, there is no seed set. Rather, human reviewers
assisted review (TAR), a process of having computer simply begin coding documents while the computer
software electronically classify documents, such as observes in the background, learning from their entries.
email and other communications, based on input from The computer analyses those tags and feeds the
expert reviewers, to expedite the organization and review team what it believes are the most important
prioritization of the document collection by documents. As the team codes those documents,
• elimination of not-relevant documents, the computer integrates that information, improving
its understanding of the data set. Continuous active
• prioritization of the most substantive documents and learning TAR is still dependent on the quality of the
• quality control of the human reviewers. human coding, but it improves continuously as the
process continues. When the review team reaches a
The computer classification may include broad topics point where few or none of the results are relevant, the
pertaining to discovery responsiveness, privilege, and process is complete.
other designated issues. TAR may dramatically reduce
the time and cost of reviewing ESI, by reducing the
amount of human review needed on documents initially
classified as potentially non-relevant.
19 See section on Robotic Process Automation (RPA), Artificial Intelligence (AI) and Emerging Technologies for more information.
Understanding the forensic technology landscape | 19