Feature Scaling in Data Science.

Suraj
3 min readMar 31, 2021
Figure 1: Feature Scaling.

Hi, There! In this blog, We will be covering one of the most important steps in a lifecycle of a machine learning project a.k.a Feature Scaling. Wait a minute, Why do we exactly need feature scaling? Don’t be afraid of the graph above!

Let’s start this topic with an analogy. Suppose you are hitting the gym on NYE with a resolution to get ripped! Now, you see the weekly exercise chart handed over to you by the instructor. It says Monday-Chest, Wednesday- Back and Shoulder, Friday- Biceps and Triceps and Sunday- Abdomen and Legs. However, you often skip Wednesday and Sunday of each week. How will it affect your body? Obviously, You will see notable differences between Chest and Biceps whereas your lower body will be skewed. Similarly, If you draw a parallel analogy to a machine learning model, You will understand that non-scaled features being fed to the model can make it more biased towards a particular feature. Hence, It is important that independent features are being scaled to either a uniform distribution/same scale prior to being fed as input to the model.

Now, We understand why do we need feature scaling in data science! The next question is, what are the common types of feature scaling techniques, how and when to use them? Let’s unroll it.

a) Min-max scaling also known as Min-max normalisation: In this type of feature scaling, We normalise the independent features to lie exclusively in the range of [0,1]. This is done by taking the ratio of difference of the feature and the minimum feature value to the difference of maximum value and minimum value. Scikit-learn gives us MinMaxScaler for the same. It usually works well when the distribution is not normal with the drawback of being highly sensitive to outliers. It is often used in feature engineering related to image data.

b) StandardScaler: In this type of feature scaling, We determine the mean and standard deviation for each of the independent features in the dataset followed by taking the ratio of the difference between mean and feature value to the standard deviation. We typically work towards ensuring that the feature columns are centred towards zero mean and unit variance. The distribution thus takes the form of a normal distribution which facilitates better fitting of machine learning models. Scikit-learn gives us StandardScaler for performing standardization transformation on the dataset.

c) Other feature scaling techniques include MaxAbsScaler, Robust Scaler etc. MaxAbsScaler rescales each feature by its maximum absolute value which is 1.0 and can be accessed in Scikit-learn via MaxAbsScaler.

Usually, The algorithms which calculate distances often need feature scaling. These include KNNs, K-Means, SVMs, PCA, LDA, Gradient Descent based algorithms like Linear and Logistic Regression and most deep learning algorithms. However, Tree-based algorithms like decision tree, Random Forest, XGBoost etc don’t need feature scaling.

With the hope that I was able to convey the importance of feature scaling, I’d like to wrap this blog.

Until, Next time !

Bye!!

--

--

Suraj

Seasoned machine learning engineer/data scientist versed with the entire life cycle of a data science project.