This experiment demonstrates how to use the "Create R Model" module to train and score a model, and use "Execute Python Script" to evaluate a model using breast cancer classification as an example.
# Create R Model# This experiment demonstrates how to use the **Create R Model** module to train, and score a naive bayes classification model using the breast cancer dataset, and use **Execute Python Script** to calculate performance and plot the performance curve. Similar to **Execute R Script** module, the **Create R Model** module allows a user to train a model using R scripts. In addition, it also allows the user to save the trained model similar to how other built-in machine learning models can be saved. This function enables the user to apply the trained model (by using R scripts) into the scoring experiment. ## Data ## We use "Breast cancer data", which is one of three cancer-related datasets provided by the Oncology Institute and can be found from UCI data repository. We use this data to classify the two types of cancer based on 9 attributes present in the data. ![complete experiment image](https://az712634.vo.msecnd.net/samplesimg/v1/37/completeExp.PNG) ## Experiment ## ### Train and Score the Model ### **Create R Model** module can be used if the user wants to train a model available in a R package. The user needs to provide two scripts, "Trainer R script" (used for training the model) and "Scorer R script" (used for scoring the model). In the "Trainer R script" box, the label and feature columns can be extracted from the input dataset using get.label.column() and get.feature.columns() functions. In this sample experiment, we train a binary classifier using Naive Bayes classification method. In the "Scorer R script" box, the user should generate a data frame with predicted class labels with corresponding probabilities on the input dataset using the input model. ![Create R Model details](https://az712634.vo.msecnd.net/samplesimg/v1/37/CreateR.PNG) After the **Create R Model** module is configured, it can be connected to **Train Model** module and then **Score Model** module as shown in the experiment graph. The user can save the trained model the same way as the models trained by other built-in machine learning modules. The following snapshot shows how to save the trained model. The **Score Model** module generates the scored dataset by including the predicted class labels and the corresponding predicted probabilities. ![Create R Model details](https://az712634.vo.msecnd.net/samplesimg/v1/37/saveTrainedModel.PNG) ### Evaluate the Model ### In this experiment, we evaluate the model by providing a custom Python script in the **Execute Python Script** module. This module plots the ROC curve and output the performance statistics. The Python scripts are shown in the snapshot below. ![Create R Model details](https://az712634.vo.msecnd.net/samplesimg/v1/37/evaluation1.PNG) The output ROC curve and the evaluation metrics (accuracy, precision, recall, AUC) are shown in the following snapshots. ![Create R Model details](https://az712634.vo.msecnd.net/samplesimg/v1/37/ROC.PNG) ![Create R Model details](https://az712634.vo.msecnd.net/samplesimg/v1/37/metrics.PNG)