computer modeling – BigDataMark

In this post, I’ll explain how to convert the R²_1 coefficient of determination equation into SQL for calculating model fit.

The coefficient of determination, also known as the R² (“R square”), is a useful value to calculate when evaluating a regression model because it represents the proportion of the total variation of an observed value explained by the model and it can be represented as a percentage that is easy to explain to a stakeholder.

The formula can be represented as the following, where a dataset has $y_{i}$ observed values, $\bar{y}$ represents the mean of the observed values, and $f_{i}$ represents a modeled value.

$R^{2}=1-\frac{\sum_{i=1}^{n} (y_{i}-f_{i})^{2}}{\sum_{i=1}^{n}(y_{i}-\bar{y})^{2}}$

When converting this formula into an algorithm in SQL, the column named rating represents my observed values, and the column named prediction represents the modeled values.

SELECT AVG(rating) FROM model_applied;

We’ll plug the average into the next query:

SELECT 1-(SUM(POWER(rating - prediction,2)))/(SUM(POWER(rating - [average],2))) FROM model_applied

If your values are INTEGER, you may need to convert them into FLOAT to get a meaningful result, because the R² is a number between 0 and 1, where 0 indicates that the regression does not represent the data, and 1 is a perfect fit.

If my R² is 0.222787592231293, then 22% of the variation can be explained by the regression, and the other 82% of the variation is unexplained. Basically, this says that the model is 22% more accurate than using a random guess (minimum error by using the mean).

Tag: computer modeling

Calculating model fit with R(2)/Coefficient of Determination in SQL

Share this: