Select columns transform

By for August 28, 2015

838 views
288 downloads


Report Abuse
This sample demonstrates how to save column selection operation as a transform and apply it to new dataset to select columns, using Select Columns Transform module.
#Select Columns Transform This sample demonstrates how to save column selection operation as a transform and apply it to new dataset to select columns, using **Select Columns Transform** module, together with **Apply Transformation** module in Azure Machine Learning. The [**Apply Transformation**](https://msdn.microsoft.com/en-us/library/azure/dn913055.aspx) module is able to modify the input dataset based on a previously computed transform. It also keeps the same set of feature columns during scoring as the set which was computed during training. ##Description There are two experiments to demonstrate how column selection transform works. The first experiment computes the transformation; the second experiment uses the computed transformation scheme in the first experiment to modify the input dataset. ##Experiment 1: Select columns transform The first experiment uses [**Filter Based Feature Selection**](https://msdn.microsoft.com/en-us/library/azure/dn913071.aspx) (count based) to reduce number of columns in the incoming sample dataset. The **Select Columns Transform** module records the names of selected columns and **Apply Transformation** module prunes the feature columns of incoming dataset. The flow of the first experiment is shown below. ![][image_exp1] The steps in this experiment are as follows. 1. Import the automobile price dataset. The original input dataset has 205 rows and 26 columns. ![][image_dataset] 2. Use the **Filter Based Feature Selection** module to score all features from the original data. Here we configured the feature scoring methods as _Count Based_. For complete list of feature scoring methods, please see [https://msdn.microsoft.com/en-us/library/azure/dn913071.aspx](https://msdn.microsoft.com/en-us/library/azure/dn913071.aspx) ![][image_filter] 3. Add **Select Columns Transform** module. 4. Add **Apply Transform** module and connect ports of the modules. ![][image_select_column] 5. View the output of the data. You will see 205 rows and 16 columns. ![][image_output] ## Experiment 2: Apply select column transform in scoring (prediction) experiment After successfully running experiment 1, follow the steps below to make a scoring experiment. The scoring experiment uses saved transform to prune the feature columns of input dataset. The steps are explained below. 1. Pick on output port of **Select Columns Transform** and save transformation. ![][image_save] 2. Use "Save As" command to start scoring experiment. Add saved transformation to the scoring experiment. 3. Connect Web Service input and saved transformation to in ports of **Apply Transformation** module. 4. Delete **Filter Based Feature Selection** and **Select Columns Transform** modules. 5. Add Web Services output module, run and deploy experiment as Web Service. In the end, the scoring experiment workflow looks like below. ![][image_exp2] ## Results In the end, the scoring experiment can prune the input dataset based on saved transformation. In this case, the 26 columns in the original data is reduced to 16 columns. You can test the web service of the scoring experiment. It will use same set of feature columns as original filtering module input and produce 16 output columns. <!-- Images --> [image_exp1]:https://az712634.vo.msecnd.net/samplesimg/v1/43/Exp1.PNG [image_dataset]:https://az712634.vo.msecnd.net/samplesimg/v1/43/Data_Set.PNG [image_filter]:https://az712634.vo.msecnd.net/samplesimg/v1/43/Filter_FS.PNG [image_select_column]:https://az712634.vo.msecnd.net/samplesimg/v1/43/Select_Column.PNG [image_output]:https://az712634.vo.msecnd.net/samplesimg/v1/43/Output.PNG [image_save]:https://az712634.vo.msecnd.net/samplesimg/v1/43/Save.png [image_exp2]:https://az712634.vo.msecnd.net/samplesimg/v1/43/Exp2.PNG