Encoding time series as interpretable features #
A common machine learning approach to time series forecasting is to reduce it to a standard supervised regression problem. A regression task takes as input a \(d\) -dimensional feature vector \(\mathbf{x}\in\mathbb{R}^d\) and predicts a scalar \(y \in \mathbb{R}\) . The regressor \(y = f(\mathbf{x})\) is learnt based on a labelled training dataset \(\left(\mathbf{x}_i,y_i\right)\) , for \(i=1,..,n\) samples. However there is do direct concept of input features ( \(\mathbf{x}\) ) and output target ( \(y\) ) for a time series. Instead, we must choose the time series values to be forecasted as the variable to be predicted and use various feature engineering to construct the features that will be used to make predictions for future time steps. For each time point \(t\) we generate a feature vector \(\mathbf{x}(t) \in\mathbb{R}^d\) based on which we need to predict the observed time series value \(y(t) \in\mathbb{R}\) . Here we describe some of the commonly used methods to transform a time series to feature matrix.
The feature vector \(\mathbf{x}(t)\) needs to be constructed only based on the time step \(t\) and the historical values of the time series \(y(1),...,y(t-1)\) and should not use the current time series value \(y(t)\) .
Lag features #
The value of the time series at previous time steps. Lag features are the classical way that time series forecasting problems are transformed into supervised learning problems.
from aix360ts.transformers import LagFeatures
transformer = LagFeatures(lags=3)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0
[1095 rows x 1 columns]
            sales(t-3)  sales(t-2)  sales(t-1)
date
2017-01-01         NaN         NaN         NaN
2017-01-02         NaN         NaN        21.0
2017-01-03         NaN        21.0        18.0
2017-01-04        21.0        18.0         9.0
2017-01-05        18.0         9.0        18.0
...                ...         ...         ...
2019-12-27       796.0      1178.0       852.0
2019-12-28      1178.0       852.0       923.0
2019-12-29       852.0       923.0      1194.0
2019-12-30       923.0      1194.0      1341.0
2019-12-31      1194.0      1341.0       920.0
[1095 rows x 3 columns]
| description | type | |
|---|---|---|
| sales(t-3) | The value of the time series (sales) at the (t-3) previous time step. | continuous | 
| sales(t-2) | The value of the time series (sales) at the (t-2) previous time step. | continuous | 
| sales(t-1) | The value of the time series (sales) at the (t-1) previous time step. | continuous | 
Seasonal lag features #
The value of the time series at time steps for the previous seasons. For example, with monthly data, the feature for February is equal to the last observed February value.
from aix360ts.transformers import SeasonalLagFeatures
transformer = SeasonalLagFeatures(lags=2, m=365)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0
[1095 rows x 1 columns]
            sales(t-2*365)  sales(t-1*365)
date
2017-01-01             NaN             NaN
2017-01-02             NaN             NaN
2017-01-03             NaN             NaN
2017-01-04             NaN             NaN
2017-01-05             NaN             NaN
...                    ...             ...
2019-12-27           428.0           463.0
2019-12-28           440.0           607.0
2019-12-29           700.0           778.0
2019-12-30           894.0          1038.0
2019-12-31           828.0           531.0
[1095 rows x 2 columns]
| description | type | |
|---|---|---|
| sales(t-2*365) | The value of the time series (sales) at the (t-2*365) previous time step. | continuous | 
| sales(t-1*365) | The value of the time series (sales) at the (t-1*365) previous time step. | continuous | 
Rolling window features #
Rolling window statistics (mean,max,min).
from aix360ts.transformers import RollingWindowFeatures
transformer = RollingWindowFeatures(window=3)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0
[1095 rows x 1 columns]
            sales_min(t-1,t-3)  sales_mean(t-1,t-3)  sales_max(t-1,t-3)
date
2017-01-01                 NaN                  NaN                 NaN
2017-01-02                 NaN                  NaN                 NaN
2017-01-03                 NaN                  NaN                 NaN
2017-01-04                 9.0            16.000000                21.0
2017-01-05                 9.0            15.000000                18.0
...                        ...                  ...                 ...
2019-12-27               796.0           942.000000              1178.0
2019-12-28               852.0           984.333333              1178.0
2019-12-29               852.0           989.666667              1194.0
2019-12-30               923.0          1152.666667              1341.0
2019-12-31               920.0          1151.666667              1341.0
[1095 rows x 3 columns]
| description | type | |
|---|---|---|
| sales_min(t-1,t-3) | The min of the past 3 values in the sales time series. | continuous | 
| sales_mean(t-1,t-3) | The mean of the past 3 values in the sales time series. | continuous | 
| sales_max(t-1,t-3) | The max of the past 3 values in the sales time series. | continuous | 
Expanding window features #
Expanding window statistics (mean,max,min).
from aix360ts.transformers import ExpandingWindowFeatures
transformer = ExpandingWindowFeatures()
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0
[1095 rows x 1 columns]
            sales_min(0,t-1)  sales_mean(0,t-1)  sales_max(0,t-1)
date
2017-01-01              21.0          21.000000              21.0
2017-01-02              18.0          19.500000              21.0
2017-01-03               9.0          16.000000              21.0
2017-01-04               9.0          16.500000              21.0
2017-01-05               9.0          16.200000              21.0
...                      ...                ...               ...
2019-12-27               9.0         364.901008            1907.0
2019-12-28               9.0         365.660256            1907.0
2019-12-29               9.0         366.552608            1907.0
2019-12-30               9.0         367.058501            1907.0
2019-12-31               9.0         367.406393            1907.0
[1095 rows x 3 columns]
| description | type | |
|---|---|---|
| sales_min(0,t-1) | The min of all the values so far in the sales time series. | continuous | 
| sales_mean(0,t-1) | The mean of all the values so far in the sales time series. | continuous | 
| sales_max(0,t-1) | The max of all the values so far in the sales time series. | continuous | 
Date features #
Date related features.
from aix360ts.transformers import DateFeatures
transformer = DateFeatures(encode_cyclical_features=False)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0
[1095 rows x 1 columns]
            year     month  day_of_year  day_of_month  week_of_year  week_of_month  ... is_month_end is_quarter_start  is_quarter_end is_year_start is_year_end is_leap_year
date                                                                                ...
2017-01-01  2017   January            1             1            52              1  ...           no              yes              no           yes          no           no
2017-01-02  2017   January            2             2             1              1  ...           no               no              no            no          no           no
2017-01-03  2017   January            3             3             1              1  ...           no               no              no            no          no           no
2017-01-04  2017   January            4             4             1              1  ...           no               no              no            no          no           no
2017-01-05  2017   January            5             5             1              1  ...           no               no              no            no          no           no
...          ...       ...          ...           ...           ...            ...  ...          ...              ...             ...           ...         ...          ...
2019-12-27  2019  December          361            27            52              4  ...           no               no              no            no          no           no
2019-12-28  2019  December          362            28            52              4  ...           no               no              no            no          no           no
2019-12-29  2019  December          363            29            52              5  ...           no               no              no            no          no           no
2019-12-30  2019  December          364            30             1              5  ...           no               no              no            no          no           no
2019-12-31  2019  December          365            31             1              5  ...          yes               no             yes            no         yes           no
[1095 rows x 18 columns]
| description | type | |
|---|---|---|
| year | The year. | ordinal | 
| month | The month name of the year from January to December. | cyclical | 
| day_of_year | The ordinal day of the year from 1 to 365. | cyclical | 
| day_of_month | The ordinal day of the month from 1 to 31. | cyclical | 
| week_of_year | The ordinal week of the year from 1 to 52. | cyclical | 
| week_of_month | The ordinal week of the month from 1 to 4. | cyclical | 
| day_of_week | The day of the week from Monday to Sunday. | cyclical | 
| is_weekend | Indicates whether the date is a weekend or not. | binary | 
| quarter | The ordinal quarter of the date from 1 to 4. | cyclical | 
| season | The season Spring/Summer/Fall/Winter. | categorical | 
| fashion_season | The fashion season Spring/Summer (January to June) or Fall/Winter (July to December). | categorical | 
| is_month_start | Indicates whether the date is the first day of the month. | binary | 
| is_month_end | Indicates whether the date is the last day of the month. | binary | 
| is_quarter_start | Indicates whether the date is the first day of the quarter. | binary | 
| is_quarter_end | Indicates whether the date is the last day of the quarter. | binary | 
| is_year_start | Indicates whether the date is the first day of the year. | binary | 
| is_year_end | Indicates whether the date is the last day of the year. | binary | 
| is_leap_year | Indicates whether the date belongs to a leap year. | binary | 
Time features #
Time related features.
from aix360ts.transformers import TimeFeatures
transformer = TimeFeatures(encode_cyclical_features=False)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0
[1095 rows x 1 columns]
            hour  minute  second
date
2017-01-01     0       0       0
2017-01-02     0       0       0
2017-01-03     0       0       0
2017-01-04     0       0       0
2017-01-05     0       0       0
...          ...     ...     ...
2019-12-27     0       0       0
2019-12-28     0       0       0
2019-12-29     0       0       0
2019-12-30     0       0       0
2019-12-31     0       0       0
[1095 rows x 3 columns]
| description | type | |
|---|---|---|
| hour | The hours of the day. | cyclical | 
| minute | The minutes of the hour. | cyclical | 
| second | The seconds of the minute. | cyclical | 
Encoding Cyclical Features #
May time attributes like month, day_of_year, hour etc. all occur in specific cycles and are refered to as cyclical features. One way to encode cyclical features is via an ordinal scale. For example, month is typically encoded via an ordinal scale from 1(January) to 12(December).
The main problem with ordinal scale is that the distance between two feature values does not reflect the true cyclical nature of the data. For example, November and January are equidistant to December, while in the ordinal scale the absolute distance between November and December is 1 while that between December and January if 11. While this may work reasonably well for certain algorithms sometime it is benefical to encode the cyclical feature to reflect the cyclical nature of the attribute.
One method commonly used for encoding a cyclical feature is to perform a sine and cosine transformation of the feature. For each feature \(x\) which takes ordinal values from \(1,...,K\) we use a pair of transformed features. \[ x_{sin} = \sin\left(\frac{2\pi (x-1)}{K}\right)\quad x_{cos} = \cos\left(\frac{2\pi (x-1)}{K}\right)\quad\text{for}\quad x=1,...,K \] Note that is essentially maps the values around a circle. As an added benefit, it is also scaled to the range [-1, 1] which will also aid convergence for neural networks.
Holiday features #
Encode country specific holidays as features. We use the python holidays package.
A buffer can also be specified before and after the holiday using a tapering triangular window.
from aix360ts.transformers import HolidayFeatures
transformer = HolidayFeatures(country="IN",
                              buffer=2,
                              include_holiday_name=True)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0
[1095 rows x 1 columns]
            holiday-IN           holiday-IN-name
date
2017-01-01    0.000000                        no
2017-01-02    0.000000                        no
2017-01-03    0.000000                        no
2017-01-04    0.000000                        no
2017-01-05    0.000000                        no
2017-01-06    0.000000                        no
2017-01-07    0.000000                        no
2017-01-08    0.000000                        no
2017-01-09    0.000000                        no
2017-01-10    0.000000                        no
2017-01-11    0.000000                        no
2017-01-12    0.333333                        no
2017-01-13    0.666667                        no
2017-01-14    1.000000  Makar Sankranti / Pongal
2017-01-15    0.666667                        no
2017-01-16    0.333333                        no
2017-01-17    0.000000                        no
2017-01-18    0.000000                        no
2017-01-19    0.000000                        no
2017-01-20    0.000000                        no
2017-01-21    0.000000                        no
2017-01-22    0.000000                        no
2017-01-23    0.000000                        no
2017-01-24    0.333333                        no
2017-01-25    0.666667                        no
2017-01-26    1.000000              Republic Day
2017-01-27    0.666667                        no
2017-01-28    0.333333                        no
2017-01-29    0.000000                        no
2017-01-30    0.000000                        no
2017-01-31    0.000000                        no
2017-02-01    0.000000                        no
2017-02-02    0.000000                        no
2017-02-03    0.000000                        no
2017-02-04    0.000000                        no
2017-02-05    0.000000                        no
2017-02-06    0.000000                        no
2017-02-07    0.000000                        no
2017-02-08    0.000000                        no
2017-02-09    0.000000                        no
2017-02-10    0.000000                        no
2017-02-11    0.000000                        no
2017-02-12    0.000000                        no
2017-02-13    0.000000                        no
2017-02-14    0.000000                        no
2017-02-15    0.000000                        no
2017-02-16    0.000000                        no
2017-02-17    0.000000                        no
2017-02-18    0.000000                        no
2017-02-19    0.000000                        no
| description | type | |
|---|---|---|
| holiday-IN | Indicates whether the date is a IN holiday or not. | continuous | 
| holiday-IN-name | The holiday name. | categorical | 
Trend features #
Features to model simple polynomial trend. Adds features of the form \(t,t^2,..\) . High degrees can cause overfitting, do not go above two unless needed.
from aix360ts.transformers import TrendFeatures
transformer = TrendFeatures(degree=3)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0
[1095 rows x 1 columns]
            sales_trend_linear  sales_trend_quadratic  sales_trend_cubic
date
2017-01-01                   0                      0                  0
2017-01-02                   1                      1                  1
2017-01-03                   2                      4                  8
2017-01-04                   3                      9                 27
2017-01-05                   4                     16                 64
...                        ...                    ...                ...
2019-12-27                1090                1188100         1295029000
2019-12-28                1091                1190281         1298596571
2019-12-29                1092                1192464         1302170688
2019-12-30                1093                1194649         1305751357
2019-12-31                1094                1196836         1309338584
[1095 rows x 3 columns]
| description | type | |
|---|---|---|
| sales_trend_linear | Feature to model simple polynomial (of degree 1) trend in sales. | continuous | 
| sales_trend_quadratic | Feature to model simple polynomial (of degree 2) trend in sales. | continuous | 
| sales_trend_cubic | Feature to model simple polynomial (of degree 3) trend in sales. | continuous |