Add mean and deviation

By for October 19, 2016

383 views
372 downloads

Language

Report Abuse
This module adds new feature for moving average and standard deviation over the sliding window
This experiment shows how to generate moving average and standard deviation features based on a given window size for a time series data. This new module **add mean and deviation**: - takes input as a dataset - adds new column a<colnumber> that is the average of selected column over the given window. - adds another new column sd<column number> that is the standard deviation of selected column over the given window - repeats above for other selected columns This module also allows user to provide a "groupby" column index. If this parameter is present, then average and standard deviation is calculated within that column values This feature is very useful in cases such as predictive maintenance where sensor data average and standard deviation needs to be used as features. In the scenario, it also helps to compute average and standard deviation over specific device or location and hence the groupBy column index comes pretty handy. ### Module Configuration ### This module takes the following parameters: - Set of columns for which average and standard deviation needs to be calculated - Index of the Column within which this average and standard deviation needs to be calculated. This parameter is omitted if its value is 0 - Window size used for computing average and standard deviation ![](http://neerajkh.blob.core.windows.net/images/StatModuleConfigCapture.PNG) ### Parameter Restrictions ### - Only works on numeric data - Only provides a single groupBy column index ### Sample Input Dataset ### ![](http://neerajkh.blob.core.windows.net/images/StatInputCapture.PNG) ### Sample Output Dataset ### #### No groupby filter #### In this case since there is no groupby filter, average is computed over all of the data for each moving window without any boundaries. As seen in the figure below, data starts with week day 0 and moving average is calculated for every past five rows. Once week day 1 is reached, moving average continues to be calculated for past 5 periods without resetting it when week day changed from 0 to 1 ![](http://neerajkh.blob.core.windows.net/images/WithoutWeekday0Capture.PNG) ![](http://neerajkh.blob.core.windows.net/images/WithoutWeekday1Capture.PNG) #### With groupby filter #### In this case groupby filter is weekday, average is computed over the data within each weekday separately. As seen in the figure below, data starts with week day 0 and moving average is calculated for every past five rows. Once week day 1 is reached, moving average resets and starts calculating moving average for past 5 periods starting with this period as the first period. ![](http://neerajkh.blob.core.windows.net/images/Weekday0Capture.PNG) ![](http://neerajkh.blob.core.windows.net/images/WeekDay1Capture.PNG) ### Overall Sample Experiment Graph ### ![](http://neerajkh.blob.core.windows.net/images/OverallExperimentStatCapture.PNG) ### Source Code### Source code for this module is located here - [https://gist.github.com/nk773/47f9c1e9512b09203b7d32dbb5167da0](https://gist.github.com/nk773/47f9c1e9512b09203b7d32dbb5167da0)