Page 256 - Data Science Algorithms in a Week
P. 256
238 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab
In this analysis, the K nearest neighbor algorithm and the Euclidean distance were
used to determine the similarity function for the numerical attributes. The Euclidean
distance is calculated using the following equation:
where,
Di is the Euclidean distance between stored case i and the new case
anx are the attributes of the new case.
aix are the attributes of the case i.
m is the number of numerical attributes.
The numerical attributes in the developed ED cases were attributes 3, 4, 5, and 6
corresponding to the numbers of doctors, nurses, lab technicians, and staff, and weighed
equally in the similarity function. The non-numerical attributes such as the category of
the problem and the path taken by the patients in the ED, will not have a certain similarity
function, as the retrieval engine will only retrieve from within the category as the new
case. Furthermore, the most commonly used paths in the EDs were sequentially
numbered (Path 1 to 4) according to their likely usage, and a similarity matrix was then
developed. Furthermore, changes in the similarity matrix caused an addition of 10 units
to the similarity function, which was then used to recalculate the Euclidean distance as
shown in Table 3 below.
Using this approach, determining the similarity percentages will not be required as no
weights were associated with attributes. The similarity (distance) used to retrieve the K
stored cases measuring between the new case and all the stored cases were then
determined using the equations below
The Induction Tree Approach
The Induction Tree approach uses the already defined indexing system to develop the
decision tree representing the case-base, resulting in faster retrieval time, and different
results than the K nearest neighbor approach. This tree represents the hierarchical
structure of the simulation cases stored in the case-base. The assignments of attributes