Page 160 - Data Science Algorithms in a Week
P. 160
Regression
The presence of the intercept term may be caused by the errors in the measurements or by
other forces playing in the equation. Since it is relatively small, the final velocity should be
estimated reasonably well. Putting the distance of 300km into the equation we get:
2
v = 4.206 * 300000 - 317.708=1261482.292
v=1123.157
Therefore for the projectile to reach the 300km from the source, we need to fire it at the
speed of 1123.157 m/s approximately.
Summary
We can think of variables as being dependent on each other in a functional way. For
example, the variable y is a function of x denoted by y=f(x). The function f(x) has constant
parameters. For example, if y depends on x linearly, then f(x)=a*x+b, where a and b are
constant parameters in the function f(x). Regression is a method to estimate these constant
parameters in such a way that the estimated f(x) follows y as closely as possible. This is
formally measured by the squared error between f(x) and y for the data samples x.
The gradient descent method minimizes this error by updating the constant parameters in
the direction of the steepest descent (that is, the partial derivative of the error), ensuring
that the parameters converge to the values resulting in the minimal error in the quickest
possible way.
The statistical software R supports the estimation of the linear regression with the function
lm.
Problems
1. Cloud storage prediction cost: Our software application generates data on a
monthly basis and stores this data in cloud storage together with the data from
the previous months. We are given the following bills for the cloud storage and
we would like to estimate the running costs for the first year of using this cloud
storage:
[ 148 ]