Page 53 - Reclaim YOUR DIGITAL GOLD (with DesignLayout Dec3) (Clickable) (Dexxi-FLIP-Audio)_Neat
P. 53
DATA COLLECTION HARVESTING
a number of dataset features is another significant
advantageof using synthetic data. Thesecharacteristics
include the scope, format, and amount of noise
(corruption or distortion) in the dataset.
Onesignificant advantage of using synthetic data is that
it eliminates the risk of copyright infringement or privacy
issues. This is a significant advantage that should not
be overlooked.This is especially intriguing if the dataset
under consideration requires information that can be
used to identify individuals.However,the use of synthetic
data sets has a number of significant drawbacks.
First, creating synthetic data is a significant burden on
the engineering side, especially if done by an individual
or a small team. Second,there is a chance that bias will
be introduced into the data. At this time, artificial data
alone is insufficient to train advanced machine learning
algorithms.
5. Manual Data Collection
The final option for data collection is probably the most
familiar: manual data collection.
This method is verysimilar to the generationof synthetic
datasets; the main differences are that real data is used
rather than simulated data, and the data is generated
manually rather than automatically.
You’re probably wondering why anyone would bother
generating their own data when there are so many
free datasets and web scraping tools available on the
Internet.
The answer is straightforward. The majority of the time,
manual data generation is done through crowdsourcing.
33