Page 51 - Reclaim YOUR DIGITAL GOLD (with DesignLayout Dec3) (Clickable) (Dexxi-FLIP-Audio)_Neat
P. 51

DATA COLLECTION HARVESTING


            2. Free and OpenDataset Access

            Open-source datasets are the most efficient and
            straightforward way to collect data for your machine
            learning model. Thousands of open-source datasets,
            similar to coding snippets, are available online. Theyare
            completely free, easy to find, and time-saving. Even if
            public datasets appear to contain an infinite amount of
            rich, detailed data, they may still require cleaning to meet
            specific requirements.

            The following are some of the best places to look for
            free public datasets:

               ● Amazon
               ● Kaggle
               ● Microsoft
               ● Government Datasets (i.e. Stats data)
               ● Lionbridge AI
               ● Google’s Datasets Search Engine

               ● UCIMachine Learning Repository


            3. Scanningfor Data on the Internet
            Assume we want to get product information from
            Amazon, such as descriptions and prices. This could be
            accomplished through repetitive typing or copy-pasting.
            However, Amazon has far too many items and their
            prices fluctuate far too frequently for this to be feasible.
            This is what web scraping tools arefor. Theysift through
            a variety of Internet data. Furthermore, these tools
            automatically or manually search for new or updated
            data and store it for your convenience.





                                                                    31
   46   47   48   49   50   51   52   53   54   55   56