Sample 8: Apply SQL transformation

By for February 19, 2015

2422 views
2276 downloads


Report Abuse
This experiment demonstrates the use of Apply SQL Transformation module.
# Apply SQL Transformation In this article, we'll demonstrate how we can perform common SQL operations such as join, union, and aggregation in Microsoft Azure Machine Learning using the **Apply SQL Transformation** module. Using this module, we can do a variety of data transformations with SQL. ## Data We use three datasets (Restaurant customer, Restaurant feature and Restaurant ratings) in this experiment. The datasets include both numerical and categorical features. Excerpt of each of the dataset is shown below. Restaurant Customer data has 19 attributes for 138 customers with each customer having a unique userID. ![][image_customer] Restaurant data has 21 attributes for 130 restaurants with each restaurant having a unique placeID. ![][image_restaurant] Restaurant Ratings data has 1161 customer ratings that customers (userID) provided for various restaurants (placeID). ![][image_ratings] ## Experiment The following diagram shows the overall workflow of the experiment: ![][image_experiment] ###Apply SQL Transformation Module The **Apply SQL Transformation** module takes up to three datasets as input and outputs a dataset generated as a result of executing the SQL Script. By default, t1 denotes the data from port one; t2 denotes the data from port two; and t3 denotes the data from port three. ![][image_ports] Next, we explain how we transformed the data with three **Apply SQL Transformation** modules in our experiment. ###Explore Rating Counts First, we did some simple exploration on the restaurant ratings dataset and obtained the number of restaurants for each rating category. In this case, only one input dataset was provided to the Apply SQL Transformation module. ![][image_sql_2] The output is shown below: ![][image_result_histogram] ###Join Datasets Second, we did an inner join of all the three datasets in SQL based on _userID_ and _placeID_. Further, we derived attributes such as _Age_ (in 2015) based on _birth\_year_ from the restaurant customer data. ![][image_sql_1] The output is shown below: ![][image_result_join] ###Obtain Top Restaurants Finally, we used the joined output of the 3 tables (from above) and computed metrics such as average rating and total ratings received for each restaurant. ![][image_sql_3] The top ten restaurants sorted by average rating in descending order (as output from the Apply SQL Transformation module) are listed below. ![][image_result_final] <!-- Images --> [image_experiment]:https://az712634.vo.msecnd.net/samplesimg/v1/S8/Experiment.PNG [image_customer]:https://az712634.vo.msecnd.net/samplesimg/v1/S8/Customer-1.PNG [image_restaurant]:https://az712634.vo.msecnd.net/samplesimg/v1/S8/Restaurant-1.PNG [image_ratings]:https://az712634.vo.msecnd.net/samplesimg/v1/S8/Ratings.PNG [image_ports]:https://az712634.vo.msecnd.net/samplesimg/v1/S8/SQL-Ports.PNG [image_sql_1]:https://az712634.vo.msecnd.net/samplesimg/v1/S8/SQL-1.PNG [image_sql_2]:https://az712634.vo.msecnd.net/samplesimg/v1/S8/SQL-2.PNG [image_sql_3]:https://az712634.vo.msecnd.net/samplesimg/v1/S8/SQL-3.PNG [image_result_join]:https://az712634.vo.msecnd.net/samplesimg/v1/S8/Result-Join.PNG [image_result_final]:https://az712634.vo.msecnd.net/samplesimg/v1/S8/Result-Final.PNG [image_result_histogram]:https://az712634.vo.msecnd.net/samplesimg/v1/S8/Result-Histogram.PNG