Binary Classification: Customer relationship prediction

By for September 2, 2014

6756 views
2870 downloads

Algorithms

Report Abuse
This experiment shows how to do predictions related to Customer Relationship Management (CRM) using binary classifiers.
## Binary Classification: Customer relationship prediction This experiment shows how to do predictions related to **Customer Relationship Management (CRM)** using binary classifiers. ##Data The data used for this experiment is from KDD Cup 2009. The dataset has 50000 rows and 230 feature columns. The task is to predict **churn**, **appetency** and **up-selling** using these features. Please refer to the [**KDD Cup 2009 website**](http://www.sigkdd.org/kdd-cup-2009-customer-relationship-prediction) for further details about the data and the task. ##Model The complete experiment graph is given below. ![][image1] First we did some simple data processing. * The raw dataset contains lots of missing values. We use the **Clean Missing Data** module to replace the missing values by 0. ![][image2] ![][image3] <<<<<<< HEAD * The customer features and the corresponding churn, appetency and up-selling labels are in different datasets. We used the **Add Columns** module to append the label columns to the feature columns. The first column *Col1* is the label column and the rest of the columns *Var1, Var2, ...* are the feature columns. ![][image4] * We split the dataset into train and test sets using the **Split** module. Then we used **Boosted Decision Tree** binary classifier with default parameters to build the prediction models. We built one model per task, i.e. to predict **up-selling**, **appetency** and **churn**. ======= * The customer features and the corresponding **churn, appetency** and **up-selling** labels are in different datasets. We use the **Add Columns** module to append the label columns to the feature columns. The first column *Col1* is the label column and the rest of the columns *Var1, Var2, ...* are the feature columns. ![][image4] * We split the dataset into train and test sets using the **Split** module. Then we use **Two-Class Boosted Decision Tree** binary classifier with default parameters to build the prediction models. We build one model per task, i.e. to predict **up-selling**, **appetency** and **churn**. >>>>>>> bfe13ba3a2a347ba668f66dfa7548a298c30a2e9 ##Results The performance of the model on the test set can be seen by visualizing the output of the **Evaluate Model** module. For the upselling task, the **ROC** curve shows that the model does better than a random model and the **area under the curve (AUC)** is 0.857. At threshold 0.5, the **precision** is 0.663, **recall** is 0.463 and **F1 score** is 0.545. ![][image5] We can move the threshold slider and see how different metrics change for the binary classification task. In the following figure we see the metrics for threshold 0.7. ![][image6] We can make similar observations for the other tasks. <!-- Images --> [image1]:http://az712634.vo.msecnd.net/samplesimg/v1/2/expt_graph.PNG [image2]:http://az712634.vo.msecnd.net/samplesimg/v1/2/raw_features.PNG [image3]:http://az712634.vo.msecnd.net/samplesimg/v1/2/scrubbed_features.PNG [image4]:http://az712634.vo.msecnd.net/samplesimg/v1/2/label_and_features.PNG [image5]:http://az712634.vo.msecnd.net/samplesimg/v1/2/upselling_evaluation.PNG [image6]:http://az712634.vo.msecnd.net/samplesimg/v1/2/upselling_evaluation_0.7.PNG