Page 26 - Regression Guideline for AMC
Normal distribu6on of data
•  These high end proper6es are called sta6s6cal “outliers” because of their unusually high rela6ve values. It is possible for some areas that there will also be low end outliers that have very low values rela6ve to the other proper6es under considera6on. Our experience to date indicates that especially in many metropolitan areas, high end outliers such as these are more common.
•  There is disagreement among sta6s6cians as to whether outliers should be included or excluded prior to running a regression. Those that favor leaving outliers in argue they add to varia6on in the sales price es6mated and as long as they do not unduly influence a model’s es6mates, they do more good than harm. Others argue that outliers that create a non-­‐normal distribu6on of the dependent variable distort the model regardless and should either be removed or have their values pulled closer to the mean through some kind of transforma6on.
•  We will take the approach that removing outliers such as these highly valued proper6es will improve the model es6mates. We will remove the outliers based on their standard devia6on scores and remove all of those that are beyond +-­‐ 2 standard devia6ons from the mean. This approximately corresponds to removing the upper and lower valued 5% of proper6es in this MLS database.

