Page 312 - Data Science Algorithms in a Week
P. 312

Artificial Intelligence for the Modeling and Prediction ...   293

                       (Najjar  et  al.,  1997).  By  reducing  the  inputs  to  the  most  relevant  compounds  –  for
                       example,  retaining  those  with  reported  activity  only–  the  researcher  could  reduce  the
                       number of input neurons and subsequently those of hidden neurons therefore minimizing
                       problems associated with topology complexity. However, the number of inputs used in
                       our works remains far higher than any of the previous attempts reported by the literature
                       (Bucinski,  Zielinski  &  Kozlowska,  2004;  Torrecilla,  Mena,  Yáñez-Sedeño,  &  García,
                       2007).  However,  the  deliberate  choice  of  active  compounds  may  introduce  bias  and
                       hamper  the  accuracy  of  the  ANNs  when  synergies  with  non  active  components  are
                       significantly involved. For example, in our work on the antioxidant activities of essential
                       oils, from the initial set of around 80 compounds present in these, only 30 compounds
                       with  relevant  antioxidant  capacity  were  selected  to  avoid  excessive  complexity  of  the
                       neural network and minimize the associated structural problems. Similarly, in our work
                       on  in  our  work  on  the  antimicrobial  activities  of  essential  oils,  from  the  initial  set  of
                       around 180 compounds present in these, only 22 compounds were selected. In this later
                       case  two  considerations  were  made:  either  to  retain  the  compounds  with  known
                       antimicrobial properties only or to eliminate the compounds without known antimicrobial
                       activity and/or present at very low percentages (≤5%). The first strategy proved to give
                       better results (Cortes-Cabrera & Prieto, 2010; Daynac, Cortes-Cabrera & Prieto, 2016).
                          The output values need in many cases to be normalized to a range usually between 0
                       and  1.  This  implies  diverse  strategies  depending  on  how  many  orders  of  magnitude
                       expand  the  original  data.  A  common  approach  is  applying  logarithms  to  the  original
                       values  (Log  x,  or log  1/x)  (Cortes-Cabrera  &  Prieto,  2010;  Daynac,  Cortes-Cabrera  &
                       Prieto, 2016; Buciński and 2009).
                          Finally,  the  overall  performance  of  the  ANNs  depends  on  the  complexity  of  the
                       biological  phenomenon  to  model.  In  our  hands  the  performance  on  prediction  of  the
                       result  of  antimicrobial  assays  was  lower  than  predicting  purely  biochemical  assays.  A
                       highest  degree  of  variability  in  the  response  of  whole  living  organisms  vs.  the  higher
                       reproducibility of biochemical reactions is in agreement with the work discussed above
                       about antiviral activities.


                                          CONCLUSION AND FUTURE TRENDS

                          Back in 1991 Zupan and Gasteiger questioned the future of the application of ANNs.
                       At the time a few applications only were reported despite a healthy output of research on
                       ANNs  (Zupan  &  Gasteiger,  1991).  The  affordability  of  computational  power  and  the
                       availability  of  ANNs  software  with  friendlier  interfaces  has  made  this  tool  more
                       accessible  and  appealing  to  the  average  researcher  in  fields  afar  from  computing,
                       facilitating their application to many different scientific fields. It is nowadays an add-on
                       to all main statistical software packages or available for free as a standalone.
   307   308   309   310   311   312   313   314   315   316   317