Many in the profession know that I am a fan of statistics. As a matter of routine, I have been using linear regression analysis in every appraisal for many years, either as primary or secondary support for at least one adjustment. Lately, there has been plenty of talk in Portland, Oregon and elsewhere about using linear regression analysis in real estate appraisal to support adjustments. In the process, it seems that a lot of misinformation about statistics and regression is also being circulated. Consequently, I will attempt to set the record straight by discussing the proper use of R squared in real estate appraisal linear regression.
R squared, also known as the coefficient of determination, is a measure (between zero and one) of how well a regression line fits the data points. An R squared of numerical one means that the model has perfect correlation and predicts every outcome. Here is a table of data and a very simple regression example with perfect correlation.
Reported GLA
Reported Sales Price
Notes
1,000
$100,000
All of these sales have been hand selected for small sites and are similar in all other ways, other than GLA.
1,500
$137,500
2,000
$175,000
In the above regression chart, it is easy to see that prices are increasing at $75 per square foot as indicated by the slope (75x in the regression formula). The R squared value of one says that these sales fit the line perfectly. An appraiser can also pair any of the sales in the above table and the result will also be exactly $75 per square foot. The next chart and regression graph does not have perfect correlation due to large and small site sizes mixed in the data. Remember that the large and small site sizes are just an example. In reality, site size differences could represent any type of variation commonly found in sales of homes (e.g. condition, features, etc.).
Small Site
$125,000
Large Site
$162,500
$200,000
In the above regression chart, I have introduced three new sales with large site sizes. If an appraiser pairs the large site sales with the small site sales, the adjustment is $25,000. If an appraiser pairs any of the sales for GLA, the answer will also be exactly $75 per square foot. R squared is less than the numeral one, but the slope and the adjustment are still $75 per square foot. This is because the data are no longer a perfect fit along the line, but the sales all still increase at $75 per square foot. The next chart has a lower R squared value because the adjustment for site size is $45,000, rather than $25,000 and the adjustment per square foot remains the same.
$145,000
$182,500
$220,000
In the above chart, the larger variation for sales price results in a smaller R squared value but the adjustment for square footage remains the same. R squared is only a measure of how well the data points fit the line. In real estate appraisal, the fit of R squared will usually be much less than ideal. An R squared that is low does not mean that the adjustment provided by regression is less accurate or less valid. An R squared that is low does not change the adjustment that the appraiser should apply for that factor being measured. In the above examples, we are only solving for GLA and it does not matter that there are other factors of variation (in this case site size), as the other variables are evenly dispersed along the trend line.
The appraiser should not rely on R squared as an indicator of reliability in the regression adjustment. The appraiser should examine the raw data or the scatter chart and look for factors that might be skewing the data or pushing the line in one direction or another. Common factors that can skew a GLA regression line is sometimes larger homes also have larger sites or higher quality. It is essential for the appraiser to carefully control the search parameters of sales data in ways that avoid skewing and collect large enough samples that normal variation of other factors can balance out and not skew the results. Appraisers should ask themselves, “If I remove just one data point from this scatter chart, will the trend change dramatically?” If the answer is yes, then maybe the regression model is too small. In that case, a larger sample or a more controlled sample might be necessary. I recommend leaving the R squared off the trend line chart and out of the appraisal report. R squared values will only confuse the reader of the appraisal report and will not strengthen the appraiser’s argument for or against the regression results.
The following is my most popular video on YouTube that gives an example of how to support a GLA adjustment using simple linear regression.
Did I leave anything out or do you want to join in the conversation? Let me know by commenting below.
If you find this information interesting or useful, please subscribe to this blog and like A Quality Appraisal, LLC on Facebook. Also, please support us by making Portland real estate appraisal related comments on our blogs and YouTube videos. If you need Portland, Oregon area residential real estate appraisal services for any reason, please request appraisal fee quote or book us to speak at your next event. We will do everything possible to assist you.
Thanks for reading,