0%

#### Feature Scaling

##### We have 2 important parts in feature scaling
• Standardization
• Normalization

What is Standardization and Normalization ?
Why are they Used ?

The terms normalization and standardization are sometimes used interchangeably,
but they usually refer to different things.

Standardization rescales data to have a mean (𝜇) of 0 and standard deviation (𝜎) of 1 (unit variance).
Formulae for Standardization is
$X_{changed} = \frac{X - \mu}{\sigma}$
For most applications standardization is recommended.

Normalization usually means to scale a variable to have a values between 0 and 1,
while standardization transforms data to have a mean of zero and a standard deviation of 1.

Normalization rescales the values into a range of [0,1].
This might be useful in some cases where all parameters need to have the same positive scale.
However, the outliers from the data set are lost.
Formulae for normalization is
$X_{changed} = \frac{X - X_{min}}{X_{max}-X_{min}}$

We must standardize the data only after the split and the scaler should be only be fitted to the x_train set because if we do that we get the mean and standard of the values in the x_test which should be hidden to us. So we will only fit the scaler to the test set and then we will transform the scaler to x_test