Page 11 - SPi Global Whitepaper_Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI
P. 11
10 Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI 11
Extracting Information from
PDF Content A risk assessment company also uses this tool to provide critical information to car insurance
TM
companies. The customer uses SPiZone to extract information from police reports on
car accidents from multiple states in the U.S. That information is pulled directly into the
company’s database so that it can precisely calculate the risk for car accidents in different
regions of the country.
One of the most labor-intensive processes, content businesses face today is translating PDFs into “Content extraction is a need for any industry,” says Venky.
searchable, categorized information. To accomplish this previously, business professionals had to type “Whether it’s pulling important information from invoices
out the text information within a PDF and then tag and categorize it. Now AI technology can scan
searchable and non-searchable PDFs and extract the most important information from them. or legal documents, content extraction can be a significant
bottleneck for businesses and SPiZone is removing that.”
TM
TM
SPi Global’s proprietary extraction tool, SPiZone , is trained to recognize certain areas of a document
and automatically extract their meaning. The technology identifies “zones” of a document, whether
they are text, a table, or an image, and then pulls out the important information from those zones.
It then normalizes the PDF information so that it can be ingested into the customer’s database.
An alternative and a less advanced solution is an Optical Character Recognition (OCR) engine. These
engines have significant limitations, says Jishnu. OCR engines do not preserve relevancy or style; they
provide raw text extraction. The solution still needs a great deal of human intervention to correctly
TM
identify areas of a document, whereas SPiZone is fully automated.
What makes SPiZone particularly powerful is that once it
TM
recognizes a certain content zone, for example an address on
an invoice, it can identify that type of content regardless of
how it is presented. The address could appear in a different
position or be aligned vertically instead of horizontally,
TM
and SPiZone will still recognize the address and pull the
appropriate data.
Developed in 2011, the tool has evolved to solve for a wide
variety of use cases. For example, one SPi Global customer
analyzes public filings that businesses submit reporting their
assets. This information is then compiled and sold as research
TM
to financial professionals. The client uses SPiZone to scan the
PDF filings, extract the most important information, and
import it into their database. That allows the company to With over 95% accuracy in the content it pulls from PDFs, SPiZone is essentially eliminating
TM
provide highly accurate and timely research to its customers.
the need for human intervention. The technology has transformed a once highly manual
process into immediate and precise content analysis.
© SPi Global Content Solutions © SPi Global Content Solutions