This was my 3 week project for working a new kaggle competition and deploying a web application to predicting benign or malignent based on images.

Many weeks ago, a kaggle competitions started SIIM-ISIC Melanoma Classification, and the favorite part for me... it was a image classification project related to computer vision, which i absolutely love to make projects on. So i started!

And wait, What is  Melanoma ?

Melanoma is a skin cancer which is responsible for the 75% of skin cancer deaths, despite being the least common skin cancer.

And if this disease can be detected very early, the chance of death is very low.

Taking the image samples, send them to doctors, doctors will analyse the samples and give you the report if it is a cancer. That's a time taking process, and remember, there are hunderds of images a single doctor will analyse. This is a tedious process.

And that's where AI comes, image we will input a image to a web app and a Machine learning model will predict the image in seconds if it is a cancer or not. It can save days as compared to tradition process.

So, Yeah, after seeing the competition, i thought of something —

To just start from exploring the dataset to deploying model in cloud.

The 18 days project: Let's do it

Well, if you take a look at my documentary. I didn't thought of a 3 week project, but at the end, it just happened.

There we really a lot new of things i learned in this whole journey, from using tf.data, and using TPU for training machine learning models (well, to be truthful, TPU is the best thing i have seen for training ML Models, they are like infinite powerful) .

Using EfficientNet , and Focal Loss ,  and another class weights , and getting started with Weights & Biases  . And much more...

Just forget, the main thing — Deploying the Model in AWS

This was fun one!

So, i started working on this project on 29 May 2020, and started writing notion documentary in a daily journal style of this project and my kaggle kernel

Breaking Everything Down

These were all of the the steps —

  • Data Exploration
  • Processing Data For Model (using tf.data)
  • Making the Model (EfficientNet Transfer Learning)
  • Training the model (Using TPU)
  • Making the Web Application (Using Streamlit)
  • Deploying the Web Application (Using AWS Ec2)

Data Exploration

It went pretty well, actually many things were recognized, but the main thing i saw was

IMBALANCED Dataset!

Highly IMBALANCED dataset, just take a look at this

But i research for this and found that there are a whole lot of techniques to deal with this, some of this i have implemented in out code including, class weights and focal loss .

Image Visualisations

Preprocessing Data

Data Preprocessing was done using tensorflow data pipelines and i also made a good visualisation of this, take a look —

Maybe it looks a bit confusing but this was the code.

Well, with this we were done with preprocessing, let's go forward.

Making the Model

I choose EfficientNet for the first time —

I just want to implement new things, that's why 😄

I was good, because there is already a python library for transfer learning with efficient net and this we was exactly what i need

efn.EfficientNetB5( input_shape=(*IMAGE_SIZE, 3), #weights='imagenet', weights='imagenet', include_top=False )

Setting up Hyperparameters

The Model looks something like this

And these were loss and optimizer

And with a model.fit, with i think 20 epochs, it was time to rest!

Model Traning

Testing

Results were good, very good.

Making the Web Application

Before Deploying the ML application we have to make it. Streamlit is the thing!

When I first listened about streamlit from my best friend Daniel! It was soo easy to get started with it, to be truthful, streamlit is my favourite library for its simplicity to use and still so much powerful.

Deploying the Web Application

I use Amazon Web Services for deploying the Web Application.

Well, for the first time i use AWS for deploying Machine Learning Applications, before that, i just Heroku (i know, i know, heroku is never made for ML Applications).

But i have got some xp. working with it's EC2 because currently, the company i am working with,  i have learned connecting with Ec2 instances, transferring files, and pip install -r requirements.txt thing, i have already gone through that.

The New thing was, starting from scratch, choose the machine, OS, number of CPUs, RAM, stuff like that and figuring out how to connect.

Turns out, it was pretty easy, YES! i am saying it truthfully, (well yeah, i got some errros on the way), but everything worked!

And also everything worked for FREE! 😂

The Application was deployed in    http://100.25.201.90:8501/

But if you open the link, it will not work, because i will surely turn off the instance to not waste energy, because it's still need more CPU to run correctly, because i have used very very big model, that can't be handled by 1 CPU.

Key Takeaways


This project was actually not perfected, there is still a big room to improvements in the project such as —

  • Using Ensemble Learning

Ensemble Learning can be actually very powerful, and i have seen it in one of my previous projects.

  • More Hyperparameter Tuning

I actually havn't tried Hyperparameter Tuning in this project, surely the model performance can increase, but also the model training takes a lot of time for a good hyperparameter tuning test.

  • Using Google BiT

I actually listened Google BiT for the first time, again, from my best friend Daniel, in one of the Machine Learning Monthly Newsletter.

Machine Learning Monthly 💻🤖 May 2020 | Zero To Mastery
5th issue of Machine Learning Monthly! Keeping you up to date with the industry, keeping your skills sharp, without wasting your valuable time.

And actually, i see that it's really a ultimate computer vision model for transfer learning, and it has broken some records.

 

With that being said:

It's wasn't about success or faliure in this projects, it was about learning new things!

There were a whole lot of new things i learned and never listened before.

References & Code

This is a daily journal style of this project —

Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.
A new tool that blends your everyday work apps into one. It’s the all-in-one workspace for you and your team

This is the Jupyter notebook on Fastai Blogs –

SIIM-ISIC Melanoma Classification
The collection of Shubhamai Machine Learning & Data Science Blogs & Notebooks.

Kaggle Notebook & Trained models from here —

Melanoma Classification
Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

Github Repo —

Shubhamai/melanoma-classification
This was Shubhamai 3 week project for working a Kaggle Competition and deploying a web application to predicting benign or malignent based on skin images. - Shubhamai/melanoma-classification