Page 53 - Reclaim YOUR DIGITAL GOLD (without audio)
P. 53

Data ColleCtion Harvesting



            a  number  of  dataset  features  is  another  significant
            advantage of using synthetic data. These characteristics
            include  the scope, format,  and  amount of noise
            (corruption or distortion) in the dataset.

            One significant advantage of using synthetic data is that
            it eliminates the risk of copyright infringement or privacy
            issues. This is a significant advantage that should not
            be overlooked. This is especially intriguing if the dataset
            under  consideration  requires  information  that  can  be
            used to identify individuals. However, the use of synthetic
            data sets has a number of significant drawbacks.

            First, creating synthetic data is a significant burden on
            the engineering side, especially if done by an individual
            or a small team. Second, there is a chance that bias will
            be introduced into the data. At this time, artificial data
            alone is insufficient to train advanced machine learning
            algorithms.


            5.  Manual Data Collection
            The final option for data collection is probably the most
            familiar: manual data collection.

            This method is very similar to the generation of synthetic
            datasets; the main differences are that real data is used
            rather  than  simulated  data,  and  the  data  is  generated
            manually rather than automatically.

            You’re  probably  wondering  why  anyone  would bother
            generating  their  own data  when  there  are  so  many
            free datasets and web scraping tools available on the
            Internet.

            The answer is straightforward. The majority of the time,
            manual data generation is done through crowdsourcing.



                                                                    33
   48   49   50   51   52   53   54   55   56   57   58