Page 51 - Reclaim YOUR DIGITAL GOLD (with DesignLayout Dec3) (Clickable) (Dexxi-FLIP-Audio)_Neat
P. 51
DATA COLLECTION HARVESTING
2. Free and OpenDataset Access
Open-source datasets are the most efficient and
straightforward way to collect data for your machine
learning model. Thousands of open-source datasets,
similar to coding snippets, are available online. Theyare
completely free, easy to find, and time-saving. Even if
public datasets appear to contain an infinite amount of
rich, detailed data, they may still require cleaning to meet
specific requirements.
The following are some of the best places to look for
free public datasets:
● Amazon
● Kaggle
● Microsoft
● Government Datasets (i.e. Stats data)
● Lionbridge AI
● Google’s Datasets Search Engine
● UCIMachine Learning Repository
3. Scanningfor Data on the Internet
Assume we want to get product information from
Amazon, such as descriptions and prices. This could be
accomplished through repetitive typing or copy-pasting.
However, Amazon has far too many items and their
prices fluctuate far too frequently for this to be feasible.
This is what web scraping tools arefor. Theysift through
a variety of Internet data. Furthermore, these tools
automatically or manually search for new or updated
data and store it for your convenience.
31