Question:
When working with prediction algorithms that use multivariables, I came across the scale
function of R, which aims to scale/standardize the values of variables.
I have no difficulties using the scale
function, but my question is specifically conceptual.
Why should I scale my variable values? What's the point? Does it make a difference, for example, in the accuracy of my algorithm's prediction model? And how can I reverse the transformation?
Answer:
Should I scale my entries? The answer is: it depends.
The truth is that scaling your data won't make it worse, so if in doubt, scale it.
Cases in which to escalate
- If the model is based on the distance between the points, such as clustering (k-meas) or dimensionality reduction (PCA) algorithms, then it is necessary to scale/normalize its inputs. See the example:
Starting from the data:
Ano Preco
0 2000 2000
1 2010 3000
2 1970 2500
The Euclidean distance matrix is:
0 1 2
0 [[ 0. 1000.05 500.9 ]
1 [1000.05 0. 501.6 ]
2 [ 500.9 501.6 0. ]]
We observe that the absolute distance from the preco
dictates what the distance will be, as its absolute value is much greater than the ano
. However, when we normalize between [0, 1], the result changes drastically:
Ano_norm Preco_norm
0 0.75 0.0
1 1.00 1.0
2 0.00 0.5
The new Euclidean distance matrix is:
0 1 2
0 [[0. 1.03 0.9 ]
1 [1.03 0. 1.12]
2 [0.9 1.12 0. ]]
Another example, referring to the PCA, is this one .
- For algorithms such as Neural Networks ( see this reference ), which use descending gradient and activation functions, scaling the inputs allows:
- That only positive features have a negative and a positive part, which facilitates training.
- It prevents any account from returning values such as
Not a Number
during training. - If the inputs are on different scales, the weights connected to the inputs will update at different rates (some faster than others). This impairs learning.
And yet normalizing the outputs is important because of the last layer enable function.
In this case, to return to the original output scale, just save the values used to normalize and do the inverse count. Ex:
To normalize:
X_norm = (X - X_min)/(X_max - X_min)
To return to original scale:
X = X_norm * (X_max - X_min) + X_min
Cases where scaling is not necessary
- Cutting algorithms such as Decision Tree and Random Forest.
other cases
For some algorithms such as linear regression, scaling is not mandatory and does not improve accuracy. Whether or not to scale the inputs will only change the found coefficients. However, as the inputs have different magnitudes (as in the example of ano
and preço
), the coefficients found can only be compared if the inputs are scaled. That is, if you want interpretability, scale the entries.