Why should we scale/standardize variable values ​​and how to reverse this transformation?


When working with prediction algorithms that use multivariables, I came across the scale function of R, which aims to scale/standardize the values ​​of variables.

I have no difficulties using the scale function, but my question is specifically conceptual.

Why should I scale my variable values? What's the point? Does it make a difference, for example, in the accuracy of my algorithm's prediction model? And how can I reverse the transformation?


Should I scale my entries? The answer is: it depends.

The truth is that scaling your data won't make it worse, so if in doubt, scale it.

Cases in which to escalate

  1. If the model is based on the distance between the points, such as clustering (k-meas) or dimensionality reduction (PCA) algorithms, then it is necessary to scale/normalize its inputs. See the example:

Starting from the data:

    Ano  Preco
0  2000   2000
1  2010   3000
2  1970   2500

The Euclidean distance matrix is:

       0       1       2   
0 [[   0.   1000.05  500.9 ]
1  [1000.05    0.    501.6 ]
2  [ 500.9   501.6     0.  ]]

We observe that the absolute distance from the preco dictates what the distance will be, as its absolute value is much greater than the ano . However, when we normalize between [0, 1], the result changes drastically:

   Ano_norm  Preco_norm
0      0.75         0.0
1      1.00         1.0
2      0.00         0.5

The new Euclidean distance matrix is:

      0    1    2 
0 [[0.   1.03 0.9 ]
1  [1.03 0.   1.12]
2  [0.9  1.12 0.  ]]

Another example, referring to the PCA, is this one .

  1. For algorithms such as Neural Networks ( see this reference ), which use descending gradient and activation functions, scaling the inputs allows:
    • That only positive features have a negative and a positive part, which facilitates training.
    • It prevents any account from returning values ​​such as Not a Number during training.
    • If the inputs are on different scales, the weights connected to the inputs will update at different rates (some faster than others). This impairs learning.

And yet normalizing the outputs is important because of the last layer enable function.

In this case, to return to the original output scale, just save the values ​​used to normalize and do the inverse count. Ex:

To normalize:

X_norm = (X - X_min)/(X_max - X_min)

To return to original scale:

X = X_norm * (X_max - X_min) + X_min

Cases where scaling is not necessary

  1. Cutting algorithms such as Decision Tree and Random Forest.

other cases

For some algorithms such as linear regression, scaling is not mandatory and does not improve accuracy. Whether or not to scale the inputs will only change the found coefficients. However, as the inputs have different magnitudes (as in the example of ano and preço ), the coefficients found can only be compared if the inputs are scaled. That is, if you want interpretability, scale the entries.

Scroll to Top