Dimensionality Reduction in Data Science.

Suraj
2 min readMar 31, 2021
Figure 1: Dimensionality Reduction.

Hi, There! Welcome to yet another blog on describing an important block of the machine learning project lifecycle. In this blog, I will be giving you all a brief walkthrough of the most popular part of feature engineering, also known as Dimensionality reduction. The core idea of Dimensionality reduction is to transform the data from a high dimensional space to a low dimensional space without distorting the meaningful properties of the original data.

I hope you were not startled upon seeing the banner of the blog! Looks freaky, isn’t it? Well, It is the output of performing dimensionality reduction on human faces dataset using PCA which aids in developing better machine learning-based face-recognition systems. Yes, yes naive face recognition systems for the latest ones use siamese networks with triplet loss, mostly!

Oh, well! Maybe I forked an alien keyword for you !? PCA, isn’t it. Let’s quickly walk through the most popular dimensionality reduction techniques we use! Principal Component Analysis(PCA) and Linear Discriminant Analysis(LDA) are few popular ones to name a few.

Principal Component Analysis(PCA) is amongst the most popular dimensionality reduction techniques and is in fact the technique used to generate the banner of the blog. It is an unsupervised technique. It is often sought after technique where dimensionality reduction with preserving of variance of the original data is preferred. You can get yourself comfortable with the application of PCA in the following notebooks. Link1, Link2. Feel free to refer here for more details.

Linear Discriminant Analysis on the other hand is a supervised method that aims towards finding the best direction in which the variance between samples of the same class is decreased and for the different classes is increased. LDA often works better when the dataset contains multiple classes with a high number of samples per class. PCA is often used when the number of samples per class is less. To get more implementation details about LDA, feel free to redirect yourself to the official scikit-learn documentation out here

I Hope that the colab notebooks give you much better implementation details along with a playground to experiment more with kernels inside these dimensionality reduction techniques. With that I’d like to sign off with this blog!

Until next time!

--

--

Suraj

Seasoned machine learning engineer/data scientist versed with the entire life cycle of a data science project.