Page 149 - Data Science Algorithms in a Week
P. 149

Regression


            So let us take for example the first two pairs (F1,C1)=(5,-15) and (F2,C2)=(14,-10), then we
            have the following:


                                              a=(-10-(-15))/(14-5)=5/9

                                               b=-15-(5/9)*5=-160/9

            Therefore, the formula to calculate degrees Celsius from degrees Fahrenheit is
            C=(5/9)*F-160/9~0.5556*F-17.7778.

            Let us verify it against the data in the table:

             ⁰F ⁰C (5/9)*F-160/9

             5  -15 -15
             14 -10 -10
             23 -5   -5

             32 0    0
             41 5    5

             50 10 10
            Therefore, the formula fits our input data 100%. The data we worked with was perfect. In
            later examples, we will see that the formula that we can derive cannot fit the data perfectly.
            The aim will be to derive a formula that fits the data best, so that the error between the
            prediction and the actual data is minimized.

            Analysis using R:
            We use the statistical analysis software R to calculate the linear dependence relation
            between the variables degrees Celsius and degrees Fahrenheit.
            The R package has the function lm which calculates the linear relationship between the
            variables. It can be used in the following form: lm(y ~ x, data = dataset_for_x_y), where y is
            the variable dependent on x. The data frame temperatures should contain the vectors with
            the values for x and y:

            Input:

                # source_code/6/frahrenheit_celsius.r
                temperatures = data.frame(
                    fahrenheit = c(5,14,23,32,41,50), celsius = c(-15,-10,-5,0,5,10)
                )

                                                    [ 137 ]
   144   145   146   147   148   149   150   151   152   153   154