Page 162 - Data Science Algorithms in a Week
P. 162
Regression
Analysis:
1. Every month, we have to pay for the data we have stored in the cloud storage so
far plus for the new data that is added to the storage in that month. We will use
linear regression to predict the cost for a general month and then we will
calculate the sum of the first 12 months to calculate the cost for the whole year.
Input:
source_code/6/cloud_storage.r
bills = data.frame(
month = c(1,2,3,4,5),
bill = c(120.0,131.2,142.1,152.9,164.3)
)
model = lm(bill ~ month, data = bills) print(model)
Output:
$ Rscript cloud_storage.r
Call:
lm(formula = bill ~ month, data = bills)
Coefficients: (Intercept) month
109.01 11.03
This means that the base cost is base_cost=109.01 euros and then to store the data
added in 1 month costs additional month_data=11.03 euros. Therefore the formula
for the nth monthly bill is as follows:
bill_amount=month_data*month_number+base_cost=11.03*month_number+109.01 euro
Remember that the sum of the first n numbers is (1/2)*n*(n+1). Thus the cost for
the first n months will be as follows:
total_cost(n months)=base_cost*n+month_data*[(1/2)*n*(n+1)]
=n*[base_cost+month_data*(1/2)*(n+1)]
=n*[109.01+11.03*(1/2)*(n+1)]
=n*[114.565+5.515*n]
[ 150 ]