Page 312 - Data Science Algorithms in a Week
P. 312
Artificial Intelligence for the Modeling and Prediction ... 293
(Najjar et al., 1997). By reducing the inputs to the most relevant compounds – for
example, retaining those with reported activity only– the researcher could reduce the
number of input neurons and subsequently those of hidden neurons therefore minimizing
problems associated with topology complexity. However, the number of inputs used in
our works remains far higher than any of the previous attempts reported by the literature
(Bucinski, Zielinski & Kozlowska, 2004; Torrecilla, Mena, Yáñez-Sedeño, & García,
2007). However, the deliberate choice of active compounds may introduce bias and
hamper the accuracy of the ANNs when synergies with non active components are
significantly involved. For example, in our work on the antioxidant activities of essential
oils, from the initial set of around 80 compounds present in these, only 30 compounds
with relevant antioxidant capacity were selected to avoid excessive complexity of the
neural network and minimize the associated structural problems. Similarly, in our work
on in our work on the antimicrobial activities of essential oils, from the initial set of
around 180 compounds present in these, only 22 compounds were selected. In this later
case two considerations were made: either to retain the compounds with known
antimicrobial properties only or to eliminate the compounds without known antimicrobial
activity and/or present at very low percentages (≤5%). The first strategy proved to give
better results (Cortes-Cabrera & Prieto, 2010; Daynac, Cortes-Cabrera & Prieto, 2016).
The output values need in many cases to be normalized to a range usually between 0
and 1. This implies diverse strategies depending on how many orders of magnitude
expand the original data. A common approach is applying logarithms to the original
values (Log x, or log 1/x) (Cortes-Cabrera & Prieto, 2010; Daynac, Cortes-Cabrera &
Prieto, 2016; Buciński and 2009).
Finally, the overall performance of the ANNs depends on the complexity of the
biological phenomenon to model. In our hands the performance on prediction of the
result of antimicrobial assays was lower than predicting purely biochemical assays. A
highest degree of variability in the response of whole living organisms vs. the higher
reproducibility of biochemical reactions is in agreement with the work discussed above
about antiviral activities.
CONCLUSION AND FUTURE TRENDS
Back in 1991 Zupan and Gasteiger questioned the future of the application of ANNs.
At the time a few applications only were reported despite a healthy output of research on
ANNs (Zupan & Gasteiger, 1991). The affordability of computational power and the
availability of ANNs software with friendlier interfaces has made this tool more
accessible and appealing to the average researcher in fields afar from computing,
facilitating their application to many different scientific fields. It is nowadays an add-on
to all main statistical software packages or available for free as a standalone.