Medical Image Recognition for the Kaggle Data Science Bowl 2017 with CNTK and LightGBM

February 1, 2017

2574 views
563 downloads

Language

Report Abuse
In this notebook, we explain how to detect lung cancer images using deep learning library CNTK and boosted trees library LightGBM. It is recommended to run this notebook in a Data Science VM with Deep Learning toolkit.
In this notebook we will explain how to quickly start competing in the [Data Science Bowl 2017](https://www.kaggle.com/c/data-science-bowl-2017) and create a first submission. The challenge of this year is lung cancer detection. The participants have to determine if a scan has cancerous lesions or not. All the information of the competition can be found in the [web page](https://www.kaggle.com/c/data-science-bowl-2017/rules). We provide an example of how to detect cancerous scans using the deep learning library [CNTK](https://github.com/Microsoft/CNTK) and the gradient boosting library [LightGBM](https://github.com/Microsoft/LightGBM/), both opensourced by Microsoft. In the notebook we are going to generate automatic features from the images using a pretrained Convolutional Neural Network (CNN) using CNTK. CNNs are great automatic feature generators, which is the secret sauce of deep learning. Therefore, we can take the weights of the penultimate layer of the pretrained network and use it an image featurizer. Once we have the features, and after performing some basic feature engineering, we can feed them to a boosted decision tree using the LightGBM library. The resulting classifier obtained a **score of 0.55979** in the competition leaderboard.

Preview