Object Localisation’s Hello World!

Suraj
2 min readMar 30, 2021
Figure 1: Vehicle Number plate Localisation.

Hi, There! As the title suggests, This blog is going to take us through the steps to train an object localisation network using deep learning. We’ll be using keras as the deep learning framework to train the hello-world model. For simplicity of development and deployment, we’ll be using a custom version of MobileNetv2-SSD a.k.a SSDLite Architecture.

Without wasting a second, let’s get started! So, let’s first derive the intuition of how a deep learning network for the task of object detection would look like. Were you able to imagine?

If not, No worries! I assume that at this point, We know the distinguishing features between Image classification, Object localisation and Object detection. If not, feel free to read more about it in the Link.

If I were to give an intuition about the basic difference between an image classifier and an object-localisation network then It’d be as follows.

The image classification network takes in an input image and outputs the probabilities of n-classes which sum up to 1 whereas an image localisation network upon being fed with an input image regresses the coordinates of the bounding box or ROI. If the input image is 2D then the coordinates of the bounding box will be 4 in number whereas, for a 3D input image, the coordinate becomes 8 as that of a cuboid.

You can refer the Google colab notebook out here for a detailed walk-through of the steps for training an object localisation network built using the sequential layers API of the keras on the vehicle number plate localisation task. Since, we have designed the network from scratch and trained for very few epochs, the accuracy is not great. Feel free to tweak the hyperparameters of the network and train for more epochs to yield better accuracy.

You can also try to use transfer learning by freezing the initial layers and regressing the output coordinates. You can download the model from Tensorflow Hub: here.

I hope once you have traversed through the notebook entirely, you can get an intuition of how to extend the last layer with a classification end along with the localisation end to obtain the multi-class detection end. If not, feel free to write with your queries at hrishabhsuraj52@gmail.com and I’ll explain it to you.

Until, next time!

--

--

Suraj

Seasoned machine learning engineer/data scientist versed with the entire life cycle of a data science project.