Compare Multi-class Classifiers: Letter recognition

By for September 2, 2014
This sample demonstrates how to compare multiple multi-class classifiers using the letter recognition dataset.
##Compare Multi-class Classifiers: Letter Recognition This sample demonstrates how to create multiclass classifiers and evaluate and compare the performance of multiple models. ##Data For this experiment, we use the letter image recognition data from the [UCI repository]( The first column is the label, which identifies each row as one of 26 letters, A-Z. The remaining 16 columns are feature columns. The dataset contains 20000 instances. Description and other details about the data can be found at []( ![][image1] For this experiment, we used the **Split** module to randomly divide the dataset using a 80-20 ratio of training data to test data. Then we trained multiple models using the **Train Model** module with the training dataset as input. ![][image2] ![][image3] ![][image4] ##Models We decided to compare four _multi-class classification algorithms_ provided in Azure ML Studio: **Multiclass Neural Network**, **Multiclass Decision Jungle**, **Multiclass Logistic Regression**, and **Multiclass Decision Forest**. ![][image5] Azure ML Studio also provides a module called **One-vs-All Multiclass** which can use any binary classifier as an input to solve a multi-class classification problem, based on this [one-vs-all]( method. Therefore, as the fith model for comparison in our letter recognition task, we used a binary classification module, **Two-Class Support Vector Machine**, and connected it to the **One-vs.-All Multiclass** module. ![][image6] ##Results We used the **Score Model** module on each combination of a trained model and the test data, and then used the **Evaluate Model** module to compute the confusion matrix of results. To view the confusion matrix, just right-click the output port of the **Evaluate Model** module and select **Visualize**. Because **Evaluate Model** has two input ports, you can compare the confusion matrices for two different models side-by-side, by connecting the optional second port to the output of another **Score Model** module. ![][image7] Next, we used custom R code in an **Execute R Script** module to compute the _macro precision_ and _macro recall_ for the individual models. We used a series of **Add Rows** modules to combine those results. The output of the final **Add Rows** module shows thecompleted results datset. From these results, we can see that the model created by using **Multiclass Decision Forest** has the best macro precision, while the model created by using **Multiclass Neural Network** has the best macro recall (although only marginally better than the decision forest model). ![][image8] We used **Visualize** on the output of **Evaluate Model** to review the confusion matrix. However, if we want to obtain the the raw data to work with the prediction results, we can use an **Execute R Script** module to pass the same data through without any modification and visualize using the module's output port. In the experiment, we did this only for the branch containing the **Multiclass Neural Network** model, to ilustrate how you can see the prediction results as a count. These numbers can be further used for computing other metrics such as **micro precision**, **micro recall** etc. ![][image9] If we scroll to the right side of the visualization panel we can also see these metrics for each class: **Average Log Loss**, **Precision**, and **Recall**. ![][image10] <!-- Images --> [image1]: [image2]: [image3]: [image4]: [image5]: [image6]: [image7]: [image8]: [image9]: [image10]: