Page 11 - SPi Global Whitepaper_Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI
P. 11

10  Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI  Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI  11






 Extracting Information from



 PDF Content  A risk assessment company also uses this tool to provide critical information to car insurance
                                                   TM
           companies. The customer uses SPiZone  to extract information from police reports on
           car accidents from multiple states in the U.S. That information is pulled directly into the
           company’s database so that it can precisely calculate the risk for car accidents in different
           regions of the country.



 One of the most labor-intensive processes, content businesses face today is translating PDFs into   “Content extraction is a need for any industry,” says Venky.
 searchable, categorized information. To accomplish this previously, business professionals had to type   “Whether it’s pulling important information from invoices
 out the text information within a PDF and then tag and categorize it. Now AI technology can scan
 searchable and non-searchable PDFs and extract the most important information from them.  or legal documents, content extraction can be a significant
               bottleneck for businesses and SPiZone  is removing that.”
                                                                          TM
 TM
 SPi Global’s proprietary extraction tool, SPiZone , is trained to recognize certain areas of a document
 and automatically extract their meaning. The technology identifies “zones” of a document, whether
 they are text, a table, or an image, and then pulls out the important information from those zones.
 It then normalizes the PDF information so that it can be ingested into the customer’s database.


 An alternative and a less advanced solution is an Optical Character Recognition (OCR) engine. These
 engines have significant limitations, says Jishnu. OCR engines do not preserve relevancy or style; they
 provide raw text extraction. The solution still needs a great deal of human intervention to correctly
 TM
 identify areas of a document, whereas SPiZone  is fully automated.
 What makes SPiZone  particularly powerful is that once it
 TM
 recognizes a certain content zone, for example an address on
 an invoice, it can identify that type of content regardless of
 how it is presented. The address could appear in a different
 position or be aligned vertically instead of horizontally,
 TM
 and SPiZone  will still recognize the address and pull the
 appropriate data.

 Developed in 2011, the tool has evolved to solve for a wide
 variety of use cases. For example, one SPi Global customer
 analyzes public filings that businesses submit reporting their
 assets. This information is then compiled and sold as research
 TM
 to financial professionals. The client uses SPiZone  to scan the
 PDF filings, extract the most important information, and
 import it into their database. That allows the company to   With over 95% accuracy in the content it pulls from PDFs, SPiZone  is essentially eliminating
                                                                               TM
 provide highly accurate and timely research to its customers.
           the need for human intervention. The technology has transformed a once highly manual
           process into immediate and precise content analysis.









 © SPi Global Content Solutions  © SPi Global Content Solutions
   6   7   8   9   10   11   12   13   14