Walmart Sales Forecasting Using Regression Analysis

January 20, 2017


The Objective is predict the weekly sales of 45 different stores of Walmart.
Walmart is a renown retailing corporation which operates as different types of hypermarket, departmental stores, grocery stores and garments buying house. For being a one of the largest retail company of the world, they often provide their datasets to public for forecasting or analyzing their information for better taking better decision about their sales. In this experiment, we use Walmart’s dataset from kaggle (link: Here they provide several datasets, among all we use three datasets named train.csv, store.csv, features.csv. These datasets contain the following information: stores.csv: This file contains anonymized information about the 45 stores, indicating the type and size of store. train.csv: This is the historical training data, which covers to 2010-02-05 to 2012-11-01. Within this file, you will find the following fields:  Store - the store number  Dept - the department number  Date - the week  Weekly_Sales - sales for the given department in the given store  IsHoliday - whether the week is a special holiday week features.csv: This file contains additional data related to the store, department, and regional activity for the given dates. It contains the following fields:  Store - the store number  Date - the week  Temperature - average temperature in the region  Fuel_Price - cost of fuel in the region  MarkDown1-5 - anonymized data related to promotional markdowns that Walmart is running. MarkDown data is only available after Nov 2011, and is not available for all stores all the time. Any missing value is marked with an NA.  CPI - the consumer price index  Unemployment - the unemployment rate  IsHoliday - whether the week is a special holiday week The task is to create a predictive model to predict the weekly sales of 45 retail stores of Walmart. Loading Dataset: In Azure machine learning studio, we uploaded the three datasets. Then we created an empty workspace and drop the datasets to the experiment. DATA PREPARATION : Now for the working purpose we need to merge the datasets to build a successive model. We first remove some unwanted column from features.csv and join it with train.csv datasets. Then with the new dataset we do another join operation with store.csv dataset. BUILD AND EVALUATE MODEL: To build and evaluate the model we first change some feature type to categorical with the help of edit metadata module. Then we split the dataset using split data module with attributes of Random Seed to 12345. Then we use Linear Regression model to predict the weekly sales in the train model module. But we did not get expected output then we use boosted linear regression tree and now we get the expected results.