Page 101 - FULL REPORT 30012024

P. 101

iii. Stroke Prediction Dataset

This section of the lexicon specifically addresses the Stroke Prediction
Dataset, which comprises individual patient information. The dataset

includes a variety of factors, spanning from fundamental demographic
data such as age and gender to medical and lifestyle particulars such as

hypertension, heart disease, marital status, occupation, housing type,

BMI, smoking habits, and history of stroke. Every column is well
delineated, offering specific information about the characteristics of the

data and its possible use in forecasting the likelihood of a stroke. The
stroke prediction dataset’s data dictionary is shown in Table 4.6.

Table 4.5 Data Dictionary for Stroke Prediction Datasets

Column Description

Id Unique identifier for each patient

gender Gender of the patient (categorical: "Male", "Female", or "Other")
age age: Age of the patient in years (continuous)

hypertension Indicates whether the patient has hypertension (0 - No, 1 - Yes)

Heart_disease Indicates whether the patient has a heart disease (0 - No, 1 - Yes)

Ever_married Indicates the marital status of the patient (categorical: "No" or
"Yes")

Work_type Type of occupation of the patient (categorical: "Private", "Self-
employed", "Govt_job", "children", or "Never_worked")

Residence_type Residence type of the patient (categorical: "Urban" or "Rural")
bmi Body mass index (BMI) of the patient (continuous)

Smoking_status Smoking status of the patient (categorical: "formerly smoked",
"never smoked", "smokes", or "Unknown")
stroke Indicates whether the patient had a stroke or not (0 - No, 1 - Yes)

96 97 98 99 100 101 102 103 104 105 106