Page 101 - FULL REPORT 30012024
P. 101
iii. Stroke Prediction Dataset
This section of the lexicon specifically addresses the Stroke Prediction
Dataset, which comprises individual patient information. The dataset
includes a variety of factors, spanning from fundamental demographic
data such as age and gender to medical and lifestyle particulars such as
hypertension, heart disease, marital status, occupation, housing type,
BMI, smoking habits, and history of stroke. Every column is well
delineated, offering specific information about the characteristics of the
data and its possible use in forecasting the likelihood of a stroke. The
stroke prediction dataset’s data dictionary is shown in Table 4.6.
Table 4.5 Data Dictionary for Stroke Prediction Datasets
Column Description
Id Unique identifier for each patient
gender Gender of the patient (categorical: "Male", "Female", or "Other")
age age: Age of the patient in years (continuous)
hypertension Indicates whether the patient has hypertension (0 - No, 1 - Yes)
Heart_disease Indicates whether the patient has a heart disease (0 - No, 1 - Yes)
Ever_married Indicates the marital status of the patient (categorical: "No" or
"Yes")
Work_type Type of occupation of the patient (categorical: "Private", "Self-
employed", "Govt_job", "children", or "Never_worked")
Residence_type Residence type of the patient (categorical: "Urban" or "Rural")
bmi Body mass index (BMI) of the patient (continuous)
Smoking_status Smoking status of the patient (categorical: "formerly smoked",
"never smoked", "smokes", or "Unknown")
stroke Indicates whether the patient had a stroke or not (0 - No, 1 - Yes)
84