Page 54 - Reclaim YOUR DIGITAL GOLD (without audio)
P. 54

RECLAIM YOUR DIGITAL GOLD



          The  practice  of delegating  tasks  to human workers
          in  order  to collect  the  necessary  pieces  of data  that,
          when combined, form the generated dataset is referred
          to as  “crowdsourcing.” Crowdsourcing can be used  to
          complete a wide range of tasks, from simple activities
          like  image  labelling  to more  involved  endeavors  like
          collaborative writing, which can involve several stages.
          Amazon Mechanical  Turk is  by far the  most popular
          crowdsourcing platform. Tasks are delegated to human
          workers on this platform, who are then compensated for
          successfully completing the tasks.

          As you may have  guessed,  there  are  numerous
          disadvantages to this manual data generation. Extracting
          and  formatting data  is  a very complex process  that
          requires  a  substantial  investment  of time  and  money,
          as  well  as  extensive  technical  expertise.  Also,  when
          it  comes  to  personally  identifiable  information  about
          customers, the use of data collected internally raises a
          number of privacy concerns, especially for businesses.

          I  hope  you now have  a  better  understanding  of the
          various methods for collecting data for machine learning
          models.


          UNDERSTANDING THE DATA HARVESTING
          PROCESS


          Regarding the collection and harvesting of AI data, there
          is one fundamental concept that must be understood.
          The information gathered and the analysis performed are
          only as accurate as the data provided. In the field of data
          mining and collection, the acronym GIGO is frequently
          used.  This  is  a  reference  to the  phrase  “Garbage  In,


           34
   49   50   51   52   53   54   55   56   57   58   59