Page 610 - NGTU_paper_withoutVideo
P. 610

Modern Geomatics Technologies and Applications

          6.  Discussion
               The study set out to train the classifiers to make them optimal for the prediction process. As shown in Fig.3a, the overall
          accuracy of CART would have decreased with a CP more than 0.26.  Furthermore, Fig.3(b) provided ideal values for C5.0
          parameters to generate the model. Based on the figure, the accuracy rate would be the highest with a 20-trial decision tree
          structure when no winnowing is applied. This was found a desired result to put both the trees in similar conditions. The lower
          overall accuracy and kappa rates of 60% and 47% was achieved by using CART, comparing to 70% and 60% respectively when
          using C5.0. In general, these two metrics cannot be always reliable and the model performance should not only be based on them.
          As a result, other metrics like precision, recall and F1-measure must be taken into consideration. Accordingly, these three metrics
          were achieved on the test set, calculated via Table 4 and presented in Table 5. The F1-Score values in Table 5 proved that the
          relatively balanced data highly affected the model overall accuracy in predicting class labels of the test set. A possible explanation
          for the weaker CART performance in classification process is that the outcomes are restricted by the smaller size of the data.
          Therefore, this issue may raise the importance of a suitable dataset selection for prediction models. For more efficiency, the
          model can be trained with different methods. Specifically, the variables could be extended to more driver/time related crash
          factors. Regarding the risk maps, red spots indicating Level 3 crashes were mainly observed in Tennessee, Kentucky and West
          Virginia states (see Fig. 4). It is apparent from Fig. 5 that C5.0 classified Level 2 crashes more than CART. Moreover, the fatality
          severity dispersion is mostly diverse in this risk map.

          7.  Conclusion
               In this paper, a comparative study was conducted with the purpose of investigating the fatal crashes predictive power of
          CART  and  C5.0  trees.  This  crash  analysis  is  a  crucial  task  because  of  its  great  irrecoverable  effects  on  the  society.  The
          classification models were evaluated with the test data, which resulted in outperforming C5.0 tree over CART in terms of several
          accuracy metrics. This task is an application of data mining technology in crash data mining. However, the results are quite
          general as the data lacks a wide variety of variables, such as various driver-related factors. Having more complementary factors
          would cause to reveal more productive information.

          8.  References
          [1] World Health Organization. Global status report on road safety 2018: Summary. No. WHO/NMH/NVI/18.20. World
          Health Organization, 2018

          [2] Effati, M., Thill, J. C., and Shabani, S.: 'Geospatial and machine learning techniques for wicked social science problems:
          analysis of crash severity on a regional highway corridor', Journal of Geographical Systems, 2015, 4, 17(2), pp 107-135

          [3] Thakali, L., Kwon, T. J., and Fu, L.: 'Identification of crash hotspots using kernel density estimation and kriging methods: a
          comparison', Journal of Modern Transportation, 2015, 6, 23(2), pp 93-106

          [4] Ghadi, M., Török, A.: 'Comparison different black spot identification methods', Transportation research procedia, 2017, 1,
          (27), pp 1105-1112

          [5] Kaya Keleş, M.: 'An overview: the impact of data mining applications on various sectors', Tehnički glasnik, 2017, 9, 11(3),
          pp 128-132

          [6] Ahmed, A. M., Rizaner, A., and Ulusoy, A. H.: 'A novel decision tree classification based on post-pruning with Bayes
          minimum risk', Plos one, 2017, 4, 13(4), e0194168

          [7] De Oña, J., López, G., and Abellán, J.: 'Extracting Decision Rules from Police Accident Reports through Decision Trees',
          Accident Analysis & Prevention, 2013, 1, (50), pp 1151–60

          [8] Mansouri, M., Kargar, M. J.: 'Analysis and Monitoring of the Traffic Suburban Road Accidents Using Data Mining
          Techniques; a Case Study of Isfahan Province in Iran', The Open Transportation Journal, 2014, 11, 8(1), pp 39-49

          [9] Delen, D., Tomak, L., Kazim, T. et al.: 'Investigating Injury Severity Risk Factors in Automobile Crashes with Predictive
          Analytics and Sensitivity Analysis Methods', Journal of Transport & Health, 2017, 3, (4), pp 118–31

          [10] Díaz, I., Villar, J. R., and de la cal, E.: 'Spanish Road Fork Traffic Analysis and Modelling', In International Conference
          on Hybrid Artificial Intelligence Systems, 2017, 6, pp 483–493


                                                                                                               8
   605   606   607   608   609   610   611   612   613   614   615