Page 477 - NGTU_paper_withoutVideo
P. 477
Modern Geomatics Technologies and Applications
management, and so on. In this regard, the review of COVID-19 Tweets Data has also been considered in new
research. Kabir and Madria analyzed at COVID-19 Tweets data over time to see how themes, subjectivity, and
human emotions changed over time. They also make the CoronaVis Twitter dataset (focused on the United
States) accessible to the research community at https://github.com/mykabir/COVID19. They developed an
interactive web application for monitoring real-time tweets on COVID-19 and dynamically generating insights.
They used sentiment analysis and correlated it with trending topics to determine the cause of a sentiment in
order to gain a deeper understanding of human emotions [15]. Lamsal introduced the COV19Tweets Dataset
[16], a large-scale Twitter dataset of over 310 million COVID-19-specific English language tweets and
sentiment ratings. The GeoCOV19Tweets Dataset [17] is also presented as a geo version of the dataset. Lamsal
also addressed the datasets' architecture in detail, as well as the tweets in both datasets. The datasets have been
made available in the hopes of improving their understanding of the spatial and temporal aspects of public
discourse around the current pandemic [18]. By analyzing publically available geolocated Twitter social media
data, Bisanzio et al. were able to predict the spatiotemporal distribution of confirmed COVID-19 cases at the
global level within the first few weeks of the current outbreak. Their findings show that geolocated Twitter data
can be used to characterize human mobility and the spread of novel disease agents like SARS-Cov-2.
Furthermore, after an initial launch has occurred, such a method may be used to predict spread within countries.
Twitter data may be combined with other data capturing human activity (such as flight traffic, cell phone, and
census data) to create a global and local warning system to increase public health response times [19]. The
relationship between COVID-19 data (number of deaths, number of incidents, recovered, and tests) and Twitter
data (user's post, geographical location, and shared profile photo) was investigated in this study using the
geographic weighted regression (GWR). The aim of this study was to look into the relationship between spatial
tweets and COVID-19 data.
Geographic Weighted Regression (GWR)
The linear regression technique calculates a parameter that connects the explanatory variables to the response
variable. When this technique is applied to spatial data, however, some issues regarding the stationarity of these
parameters over space emerge.
To identify the nature of relationships between variables, linear regression models the dependent variable y as a
linear function of explanatory variables x1,..., xp. If you have n observations, the model is written:
= + ∑ + (1)
0
=1
where β0, β1,..., βp are the parameters and ε 1, ε2,..., εn are the error terms. In this model, the coefficients β k are
considered identical across the study area. However, the hypothesis of spatial uniformity of the effect of
explanatory variables on the dependent variable is often unrealistic [20]. If the parameters vary significantly in
space, a global estimator will hide the geographical richness of the phenomenon. GWR is a type of model that
has variable coefficients. The regression coefficients are not constant; they vary according to the geographical
coordinates of the observations. In other words, the explanatory parameter coefficients form continuous surfaces
that are evaluated at specific points in space [20, 21].
= ( , ) + ∑ ( , ) + (2)
0
=1
Where (ui ,vi) are the geographical coordinates.
The following hypothesis is used to predict the model: the closer two observations are geographically, the more
similar the effect of the explanatory variables on the dependent variable, i.e. the closer the coefficients of the
regression's explanatory parameters. As a result, to predict the model with variable coefficients at point i, the
fixed-coefficients model was used, and only observations close to i were included in the regression. However,
the greater the number of points in the sample, the smaller the variance, but the greater the bias. The solution is
to minimize the value of the most distant observations by assigning a decreasing weight to each observation as
one gets closer to the point of interest. Output fields of this analysis include StdResid (standardized residual
2
values), LocalR2 (weighted r between observed and predicted values), and Predicted (estimated local values)
[22].