Page 610 - NGTU_paper_withoutVideo
P. 610
Modern Geomatics Technologies and Applications
6. Discussion
The study set out to train the classifiers to make them optimal for the prediction process. As shown in Fig.3a, the overall
accuracy of CART would have decreased with a CP more than 0.26. Furthermore, Fig.3(b) provided ideal values for C5.0
parameters to generate the model. Based on the figure, the accuracy rate would be the highest with a 20-trial decision tree
structure when no winnowing is applied. This was found a desired result to put both the trees in similar conditions. The lower
overall accuracy and kappa rates of 60% and 47% was achieved by using CART, comparing to 70% and 60% respectively when
using C5.0. In general, these two metrics cannot be always reliable and the model performance should not only be based on them.
As a result, other metrics like precision, recall and F1-measure must be taken into consideration. Accordingly, these three metrics
were achieved on the test set, calculated via Table 4 and presented in Table 5. The F1-Score values in Table 5 proved that the
relatively balanced data highly affected the model overall accuracy in predicting class labels of the test set. A possible explanation
for the weaker CART performance in classification process is that the outcomes are restricted by the smaller size of the data.
Therefore, this issue may raise the importance of a suitable dataset selection for prediction models. For more efficiency, the
model can be trained with different methods. Specifically, the variables could be extended to more driver/time related crash
factors. Regarding the risk maps, red spots indicating Level 3 crashes were mainly observed in Tennessee, Kentucky and West
Virginia states (see Fig. 4). It is apparent from Fig. 5 that C5.0 classified Level 2 crashes more than CART. Moreover, the fatality
severity dispersion is mostly diverse in this risk map.
7. Conclusion
In this paper, a comparative study was conducted with the purpose of investigating the fatal crashes predictive power of
CART and C5.0 trees. This crash analysis is a crucial task because of its great irrecoverable effects on the society. The
classification models were evaluated with the test data, which resulted in outperforming C5.0 tree over CART in terms of several
accuracy metrics. This task is an application of data mining technology in crash data mining. However, the results are quite
general as the data lacks a wide variety of variables, such as various driver-related factors. Having more complementary factors
would cause to reveal more productive information.
8. References
[1] World Health Organization. Global status report on road safety 2018: Summary. No. WHO/NMH/NVI/18.20. World
Health Organization, 2018
[2] Effati, M., Thill, J. C., and Shabani, S.: 'Geospatial and machine learning techniques for wicked social science problems:
analysis of crash severity on a regional highway corridor', Journal of Geographical Systems, 2015, 4, 17(2), pp 107-135
[3] Thakali, L., Kwon, T. J., and Fu, L.: 'Identification of crash hotspots using kernel density estimation and kriging methods: a
comparison', Journal of Modern Transportation, 2015, 6, 23(2), pp 93-106
[4] Ghadi, M., Török, A.: 'Comparison different black spot identification methods', Transportation research procedia, 2017, 1,
(27), pp 1105-1112
[5] Kaya Keleş, M.: 'An overview: the impact of data mining applications on various sectors', Tehnički glasnik, 2017, 9, 11(3),
pp 128-132
[6] Ahmed, A. M., Rizaner, A., and Ulusoy, A. H.: 'A novel decision tree classification based on post-pruning with Bayes
minimum risk', Plos one, 2017, 4, 13(4), e0194168
[7] De Oña, J., López, G., and Abellán, J.: 'Extracting Decision Rules from Police Accident Reports through Decision Trees',
Accident Analysis & Prevention, 2013, 1, (50), pp 1151–60
[8] Mansouri, M., Kargar, M. J.: 'Analysis and Monitoring of the Traffic Suburban Road Accidents Using Data Mining
Techniques; a Case Study of Isfahan Province in Iran', The Open Transportation Journal, 2014, 11, 8(1), pp 39-49
[9] Delen, D., Tomak, L., Kazim, T. et al.: 'Investigating Injury Severity Risk Factors in Automobile Crashes with Predictive
Analytics and Sensitivity Analysis Methods', Journal of Transport & Health, 2017, 3, (4), pp 118–31
[10] Díaz, I., Villar, J. R., and de la cal, E.: 'Spanish Road Fork Traffic Analysis and Modelling', In International Conference
on Hybrid Artificial Intelligence Systems, 2017, 6, pp 483–493
8