Page 256 - Data Science Algorithms in a Week
P. 256

238          Khaled Alshareef, Ahmad Rahal and Mohammed Basingab

                          In this analysis, the K nearest neighbor algorithm and the Euclidean distance were
                       used  to  determine  the  similarity  function  for  the  numerical  attributes.  The  Euclidean
                       distance is calculated using the following equation:








                          where,
                          Di is the Euclidean distance between stored case i and the new case
                          anx are the attributes of the new case.
                          aix are the attributes of the case i.
                          m is the number of numerical attributes.

                          The  numerical  attributes in  the  developed  ED  cases  were  attributes  3,  4,  5,  and  6
                       corresponding to the numbers of doctors, nurses, lab technicians, and staff, and weighed
                       equally in the similarity function. The non-numerical attributes such as the category of
                       the problem and the path taken by the patients in the ED, will not have a certain similarity
                       function, as the retrieval engine will only retrieve from within the category as the new
                       case.  Furthermore,  the  most  commonly  used  paths  in  the  EDs  were  sequentially
                       numbered (Path 1 to 4) according to their likely usage, and a similarity matrix was then
                       developed. Furthermore, changes in the similarity matrix caused an addition of 10 units
                       to the similarity function, which was then used to recalculate the Euclidean distance as
                       shown in Table 3 below.
                          Using this approach, determining the similarity percentages will not be required as no
                       weights were associated with attributes. The similarity (distance) used to retrieve the K
                       stored  cases  measuring  between  the  new  case  and  all  the  stored  cases  were  then
                       determined using the equations below







                       The Induction Tree Approach

                          The Induction Tree approach uses the already defined indexing system to develop the
                       decision tree representing the case-base, resulting in faster retrieval time, and different
                       results  than  the  K  nearest  neighbor  approach.  This  tree  represents  the  hierarchical
                       structure of the simulation cases stored in the case-base. The assignments of attributes
   251   252   253   254   255   256   257   258   259   260   261