Page 478 - NGTU_paper_withoutVideo
P. 478
Modern Geomatics Technologies and Applications
Methodology and results
This study needs two datasets: COVID-19 data and Twitter data. The COVID-19 data such as the number of
confirmed cases, total deaths, total recovered cases, and transition speed were used based on WHO reports until
May 01, 2020 (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports). Twitter
data from 2020-02-04 to 2020-05-01 was downloaded from the https://data.humdata.org/dataset/covid-19-
twitter-data-geographic-distribution?force_layout=desktop. All the Twitter data file contains 6 different
attributes ( tweet_id, created_at, loc, text, user_id, verified). The data contains tweets only with the location
information. Table 1 represents the feature attributes in the shared data with a description [15].
Table 1, Data attributes
More than 160 million public tweets (including 66 million text tweets and 94 million location tweets) were
collected and inserted into the GWR model. Location tweets and text tweets were chosen as input and are
presented in Figure 1. GWR takes a unique equation for each function in the dataset, combining the dependent
and explanatory variables of features within each target feature's bandwidth. The shape and extent of the
bandwidth are determined by user input for the Kernel type, Bandwidth method, Distance, and Number of
neighbors’ parameters, with one restriction: if the number of neighboring features exceeds 1000, only the
nearest 1000 are integrated into each local equation.