Page 10 - SPi Global Whitepaper_Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI
P. 10

10  Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI                                                                           Transform, Content to Actionable Data and Unlock New Business Opportunities Through AI  11






          Extracting Information from



          PDF Content                                                                                                                A risk assessment company also uses this tool to provide critical information to car insurance
                                                                                                                                                                              TM
                                                                                                                                     companies. The customer uses SPiZone  to extract information from police reports on
                                                                                                                                     car accidents from multiple states in the U.S. That information is pulled directly into the
                                                                                                                                     company’s database so that it can precisely calculate the risk for car accidents in different
                                                                                                                                     regions of the country.



          One of the most labor-intensive processes, content businesses face today is translating PDFs into                              “Content extraction is a need for any industry,” says Venky.
          searchable, categorized information. To accomplish this previously, business professionals had to type                         “Whether it’s pulling important information from invoices
          out the text information within a PDF and then tag and categorize it. Now AI technology can scan
          searchable and non-searchable PDFs and extract the most important information from them.                                       or legal documents, content extraction can be a significant
                                                                                                                                         bottleneck for businesses and SPiZone  is removing that.”
                                                                                                                                                                                                     TM
                                                            TM
          SPi Global’s proprietary extraction tool, SPiZone , is trained to recognize certain areas of a document
          and automatically extract their meaning. The technology identifies “zones” of a document, whether
          they are text, a table, or an image, and then pulls out the important information from those zones.
          It then normalizes the PDF information so that it can be ingested into the customer’s database.


          An alternative and a less advanced solution is an Optical Character Recognition (OCR) engine. These
          engines have significant limitations, says Jishnu. OCR engines do not preserve relevancy or style; they
          provide raw text extraction. The solution still needs a great deal of human intervention to correctly
                                                           TM
          identify areas of a document, whereas SPiZone  is fully automated.
          What makes SPiZone  particularly powerful is that once it
                                TM
          recognizes a certain content zone, for example an address on
          an invoice, it can identify that type of content regardless of
          how it is presented. The address could appear in a different
          position or be aligned vertically instead of horizontally,
                       TM
          and SPiZone  will still recognize the address and pull the
          appropriate data.

          Developed in 2011, the tool has evolved to solve for a wide
          variety of use cases. For example, one SPi Global customer
          analyzes public filings that businesses submit reporting their
          assets. This information is then compiled and sold as research
                                                             TM
          to financial professionals. The client uses SPiZone  to scan the
          PDF filings, extract the most important information, and
          import it into their database. That allows the company to                                                                  With over 95% accuracy in the content it pulls from PDFs, SPiZone  is essentially eliminating
                                                                                                                                                                                                         TM
          provide highly accurate and timely research to its customers.
                                                                                                                                     the need for human intervention. The technology has transformed a once highly manual
                                                                                                                                     process into immediate and precise content analysis.









        © SPi Global Content Solutions                                                                                             © SPi Global Content Solutions
   5   6   7   8   9   10   11   12   13   14