Page 53 - Reclaim YOUR DIGITAL GOLD (with DesignLayout Dec3) (Clickable) (Dexxi-FLIP-Audio)_Neat
P. 53

DATA COLLECTION HARVESTING


            a number of dataset features is another significant
            advantageof using synthetic data. Thesecharacteristics
            include the scope, format, and amount of noise
            (corruption or distortion) in the dataset.

            Onesignificant advantage of using synthetic data is that
            it eliminates the risk of copyright infringement or privacy
            issues. This is a significant advantage that should not
            be overlooked.This is especially intriguing if the dataset
            under consideration requires information that can be
            used to identify individuals.However,the use of synthetic
            data sets has a number of significant drawbacks.

            First, creating synthetic data is a significant burden on
            the engineering side, especially if done by an individual
            or a small team. Second,there is a chance that bias will
            be introduced into the data. At this time, artificial data
            alone is insufficient to train advanced machine learning
            algorithms.


            5. Manual Data Collection

            The final option for data collection is probably the most
            familiar: manual data collection.

            This method is verysimilar to the generationof synthetic
            datasets; the main differences are that real data is used
            rather than simulated data, and the data is generated
            manually rather than automatically.

            You’re probably wondering why anyone would bother
            generating their own data when there are so many
            free datasets and web scraping tools available on the
            Internet.

            The answer is straightforward. The majority of the time,
            manual data generation is done through crowdsourcing.


                                                                    33
   48   49   50   51   52   53   54   55   56   57   58