Step 3. Operationalize a VW model

September 27, 2016

121 views
20 downloads


Report Abuse
This example shows how to operationalized a trained VW model.
This is the step 3 in the [Vowpal Wabbit Samples Collection](https://gallery.cortanaintelligence.com/Collection/Vowpal-Wabbit-Samples-2). To see how the trained model is created, please see [step 2](https://gallery.cortanaintelligence.com/Experiment/Train-a-VW-Model-with-Small-Dataset-1). In the previous experiment, we successfully trained a VW model using the Adult Income Census data. Right click on the output port of the Train Vowpal Wabbit module, and save the trained model with a unique name. The model will be saved in your workspace in the Trained Model page, and become visible in the Trained Models category in the modules palette. Now we are ready to create a predictive experiment using that saved trained model. ![Predictive Experiment Screenshot](http://az754797.vo.msecnd.net/docs/vw/VWO16N.png) The predictive experiment is a bit different than a regular predictive experiment in that you will probably need to format the dataset to be scored into the VW format. And here again you can use a bit Python script to do that. # convert a dataframe into VW format import pandas as pd import numpy as np def azureml_main(inputDF): colsToExclude = ['workclass', 'occupation', 'native-country'] numericCols = ['fnlwgt'] output = convertDataFrameToVWFormat(inputDF, colsToExclude, numericCols) return output def convertDataFrameToVWFormat(inputDF, labelColName, trueLabel, colsToExclude, numericCols): # remove '|' and ':' that are special characters in VW def clean(s): return "".join(s.split()).replace("|", "").replace(":", "") def parseRow(row): line = [] # set all labels to 1 since it is not used in scoring line.append("1 |") for colName in featureCols: if (colName in numericCols): # format numeric features line.append("{}:{}".format(colName, row[colName])) else: # format string features line.append(clean(str(row[colName]))) vw_line = " ".join(line) return vw_line # drop columns we don't need inputDF.drop(colsToExclude, axis = 1) # select feature columns featureCols = inputDF.columns # parse each row output = inputDF.apply(parseRow , axis = 1).to_frame() return output Again, this is approach is totally fine if you are planning to call this web service using real-time request-response style, or if you are planning to call it using batch service but the batch size is rather small. If the batch is large, I recommend that you do the VW format conversion outside of Azure ML to avoid running out of memory in the Execute Python Script module. In the Score Vowpal Wabbit module, the VW arguments is specified as: --link logistic This tells VW engine to map the scored result into [0, 1] space as probabilities. And finally, after scoring is done, we use a little bit R code to set the threshold to 0.5 and calculate the prediction. # read input data dataset <- maml.mapInputPort(1) # set the scoring threshold to 0.5 threshold = 0.5 # set negative class dataset$MyScoredLabels[dataset$Results < threshold] <- -1 # set positive class dataset$MyScoredLabels[dataset$Results >= threshold] <- 1 # Result is the probability when "--link logistic" is on dataset$MyScoredProbabilities <- dataset$Results # drop unused columns. dataset$Results <- NULL # set output data maml.mapOutputPort("dataset"); After a successful run, you can then [deploy this predictive experiment as a web service API](https://channel9.msdn.com/Blogs/Windows-Azure/Deploying-a-predictive-model-as-a-service-part-I-).