Auto-featurization: Churn Prediction on KDDCup2015 Dataset

January 14, 2016

885 views
367 downloads

Algorithms

Report Abuse
This is a sample auto featurization experiment that uses logs from KDD Cup 2015 dataset. It achieves AUC = 0.9006.
This is a sample auto featurization experiment that uses logs from KDD Cup 2015 dataset. It achieves AUC = 0.9006 w/ 5-fold cross-validation on the training data (#trees = 1200, #leaves = 100, minimum instances per leaf = 50, learning rate = 0.01). The main component of the experiment is the python script module that runs the auto-featurizer. The idea of auto-featurization is to have a featurizer that can perform well on many different types of log data. You can play with the configuration of the auto-featurizer (see documentation inside the python script) to fit your dataset. This experiment takes ~3 hours to run with the full 1.8GB of joined logs from KDD Cup data. If you want to run a quick version, change the reader module to one of sampled datasets below. 1000 rows only: https://simplexgallerystorage.blob.core.windows.net/autofeaturization/kddcup2015/JoinedLogs_TrainTest_Sample1K.tsv 1 million rows only: https://simplexgallerystorage.blob.core.windows.net/autofeaturization/kddcup2015/JoinedLogs_TrainTest_Sample1M.tsv