Interpretable features

Encoding time series as interpretable features #

A common machine learning approach to time series forecasting is to reduce it to a standard supervised regression problem. A regression task takes as input a \(d\) -dimensional feature vector \(\mathbf{x}\in\mathbb{R}^d\) and predicts a scalar \(y \in \mathbb{R}\) . The regressor \(y = f(\mathbf{x})\) is learnt based on a labelled training dataset \(\left(\mathbf{x}_i,y_i\right)\) , for \(i=1,..,n\) samples. However there is do direct concept of input features ( \(\mathbf{x}\) ) and output target ( \(y\) ) for a time series. Instead, we must choose the time series values to be forecasted as the variable to be predicted and use various feature engineering to construct the features that will be used to make predictions for future time steps. For each time point \(t\) we generate a feature vector \(\mathbf{x}(t) \in\mathbb{R}^d\) based on which we need to predict the observed time series value \(y(t) \in\mathbb{R}\) . Here we describe some of the commonly used methods to transform a time series to feature matrix.

The feature vector \(\mathbf{x}(t)\) needs to be constructed only based on the time step \(t\) and the historical values of the time series \(y(1),...,y(t-1)\) and should not use the current time series value \(y(t)\) .

Lag features #

code example

The value of the time series at previous time steps. Lag features are the classical way that time series forecasting problems are transformed into supervised learning problems.

from aix360ts.transformers import LagFeatures
transformer = LagFeatures(lags=3)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0

[1095 rows x 1 columns]
            sales(t-3)  sales(t-2)  sales(t-1)
date
2017-01-01         NaN         NaN         NaN
2017-01-02         NaN         NaN        21.0
2017-01-03         NaN        21.0        18.0
2017-01-04        21.0        18.0         9.0
2017-01-05        18.0         9.0        18.0
...                ...         ...         ...
2019-12-27       796.0      1178.0       852.0
2019-12-28      1178.0       852.0       923.0
2019-12-29       852.0       923.0      1194.0
2019-12-30       923.0      1194.0      1341.0
2019-12-31      1194.0      1341.0       920.0

[1095 rows x 3 columns]
descriptiontype
sales(t-3)The value of the time series (sales) at the (t-3) previous time step.continuous
sales(t-2)The value of the time series (sales) at the (t-2) previous time step.continuous
sales(t-1)The value of the time series (sales) at the (t-1) previous time step.continuous

Seasonal lag features #

code example

The value of the time series at time steps for the previous seasons. For example, with monthly data, the feature for February is equal to the last observed February value.

from aix360ts.transformers import SeasonalLagFeatures
transformer = SeasonalLagFeatures(lags=2, m=365)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0

[1095 rows x 1 columns]
            sales(t-2*365)  sales(t-1*365)
date
2017-01-01             NaN             NaN
2017-01-02             NaN             NaN
2017-01-03             NaN             NaN
2017-01-04             NaN             NaN
2017-01-05             NaN             NaN
...                    ...             ...
2019-12-27           428.0           463.0
2019-12-28           440.0           607.0
2019-12-29           700.0           778.0
2019-12-30           894.0          1038.0
2019-12-31           828.0           531.0

[1095 rows x 2 columns]
descriptiontype
sales(t-2*365)The value of the time series (sales) at the (t-2*365) previous time step.continuous
sales(t-1*365)The value of the time series (sales) at the (t-1*365) previous time step.continuous

Rolling window features #

code example

Rolling window statistics (mean,max,min).

from aix360ts.transformers import RollingWindowFeatures
transformer = RollingWindowFeatures(window=3)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0

[1095 rows x 1 columns]
            sales_min(t-1,t-3)  sales_mean(t-1,t-3)  sales_max(t-1,t-3)
date
2017-01-01                 NaN                  NaN                 NaN
2017-01-02                 NaN                  NaN                 NaN
2017-01-03                 NaN                  NaN                 NaN
2017-01-04                 9.0            16.000000                21.0
2017-01-05                 9.0            15.000000                18.0
...                        ...                  ...                 ...
2019-12-27               796.0           942.000000              1178.0
2019-12-28               852.0           984.333333              1178.0
2019-12-29               852.0           989.666667              1194.0
2019-12-30               923.0          1152.666667              1341.0
2019-12-31               920.0          1151.666667              1341.0

[1095 rows x 3 columns]
descriptiontype
sales_min(t-1,t-3)The min of the past 3 values in the sales time series.continuous
sales_mean(t-1,t-3)The mean of the past 3 values in the sales time series.continuous
sales_max(t-1,t-3)The max of the past 3 values in the sales time series.continuous

Expanding window features #

code example

Expanding window statistics (mean,max,min).

from aix360ts.transformers import ExpandingWindowFeatures
transformer = ExpandingWindowFeatures()
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0

[1095 rows x 1 columns]
            sales_min(0,t-1)  sales_mean(0,t-1)  sales_max(0,t-1)
date
2017-01-01              21.0          21.000000              21.0
2017-01-02              18.0          19.500000              21.0
2017-01-03               9.0          16.000000              21.0
2017-01-04               9.0          16.500000              21.0
2017-01-05               9.0          16.200000              21.0
...                      ...                ...               ...
2019-12-27               9.0         364.901008            1907.0
2019-12-28               9.0         365.660256            1907.0
2019-12-29               9.0         366.552608            1907.0
2019-12-30               9.0         367.058501            1907.0
2019-12-31               9.0         367.406393            1907.0

[1095 rows x 3 columns]
descriptiontype
sales_min(0,t-1)The min of all the values so far in the sales time series.continuous
sales_mean(0,t-1)The mean of all the values so far in the sales time series.continuous
sales_max(0,t-1)The max of all the values so far in the sales time series.continuous

Date features #

code example

Date related features.

from aix360ts.transformers import DateFeatures
transformer = DateFeatures(encode_cyclical_features=False)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0

[1095 rows x 1 columns]
            year     month  day_of_year  day_of_month  week_of_year  week_of_month  ... is_month_end is_quarter_start  is_quarter_end is_year_start is_year_end is_leap_year
date                                                                                ...
2017-01-01  2017   January            1             1            52              1  ...           no              yes              no           yes          no           no
2017-01-02  2017   January            2             2             1              1  ...           no               no              no            no          no           no
2017-01-03  2017   January            3             3             1              1  ...           no               no              no            no          no           no
2017-01-04  2017   January            4             4             1              1  ...           no               no              no            no          no           no
2017-01-05  2017   January            5             5             1              1  ...           no               no              no            no          no           no
...          ...       ...          ...           ...           ...            ...  ...          ...              ...             ...           ...         ...          ...
2019-12-27  2019  December          361            27            52              4  ...           no               no              no            no          no           no
2019-12-28  2019  December          362            28            52              4  ...           no               no              no            no          no           no
2019-12-29  2019  December          363            29            52              5  ...           no               no              no            no          no           no
2019-12-30  2019  December          364            30             1              5  ...           no               no              no            no          no           no
2019-12-31  2019  December          365            31             1              5  ...          yes               no             yes            no         yes           no

[1095 rows x 18 columns]
descriptiontype
yearThe year.ordinal
monthThe month name of the year from January to December.cyclical
day_of_yearThe ordinal day of the year from 1 to 365.cyclical
day_of_monthThe ordinal day of the month from 1 to 31.cyclical
week_of_yearThe ordinal week of the year from 1 to 52.cyclical
week_of_monthThe ordinal week of the month from 1 to 4.cyclical
day_of_weekThe day of the week from Monday to Sunday.cyclical
is_weekendIndicates whether the date is a weekend or not.binary
quarterThe ordinal quarter of the date from 1 to 4.cyclical
seasonThe season Spring/Summer/Fall/Winter.categorical
fashion_seasonThe fashion season Spring/Summer (January to June) or Fall/Winter (July to December).categorical
is_month_startIndicates whether the date is the first day of the month.binary
is_month_endIndicates whether the date is the last day of the month.binary
is_quarter_startIndicates whether the date is the first day of the quarter.binary
is_quarter_endIndicates whether the date is the last day of the quarter.binary
is_year_startIndicates whether the date is the first day of the year.binary
is_year_endIndicates whether the date is the last day of the year.binary
is_leap_yearIndicates whether the date belongs to a leap year.binary

Time features #

code example

Time related features.

from aix360ts.transformers import TimeFeatures
transformer = TimeFeatures(encode_cyclical_features=False)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0

[1095 rows x 1 columns]
            hour  minute  second
date
2017-01-01     0       0       0
2017-01-02     0       0       0
2017-01-03     0       0       0
2017-01-04     0       0       0
2017-01-05     0       0       0
...          ...     ...     ...
2019-12-27     0       0       0
2019-12-28     0       0       0
2019-12-29     0       0       0
2019-12-30     0       0       0
2019-12-31     0       0       0

[1095 rows x 3 columns]
descriptiontype
hourThe hours of the day.cyclical
minuteThe minutes of the hour.cyclical
secondThe seconds of the minute.cyclical

Encoding Cyclical Features #

May time attributes like month, day_of_year, hour etc. all occur in specific cycles and are refered to as cyclical features. One way to encode cyclical features is via an ordinal scale. For example, month is typically encoded via an ordinal scale from 1(January) to 12(December).

The main problem with ordinal scale is that the distance between two feature values does not reflect the true cyclical nature of the data. For example, November and January are equidistant to December, while in the ordinal scale the absolute distance between November and December is 1 while that between December and January if 11. While this may work reasonably well for certain algorithms sometime it is benefical to encode the cyclical feature to reflect the cyclical nature of the attribute.

One method commonly used for encoding a cyclical feature is to perform a sine and cosine transformation of the feature. For each feature \(x\) which takes ordinal values from \(1,...,K\) we use a pair of transformed features. \[ x_{sin} = \sin\left(\frac{2\pi (x-1)}{K}\right)\quad x_{cos} = \cos\left(\frac{2\pi (x-1)}{K}\right)\quad\text{for}\quad x=1,...,K \] Note that is essentially maps the values around a circle. As an added benefit, it is also scaled to the range [-1, 1] which will also aid convergence for neural networks.

Holiday features #

code example

Encode country specific holidays as features. We use the python holidays package.

A buffer can also be specified before and after the holiday using a tapering triangular window.
from aix360ts.transformers import HolidayFeatures
transformer = HolidayFeatures(country="IN",
                              buffer=2,
                              include_holiday_name=True)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0

[1095 rows x 1 columns]
            holiday-IN           holiday-IN-name
date
2017-01-01    0.000000                        no
2017-01-02    0.000000                        no
2017-01-03    0.000000                        no
2017-01-04    0.000000                        no
2017-01-05    0.000000                        no
2017-01-06    0.000000                        no
2017-01-07    0.000000                        no
2017-01-08    0.000000                        no
2017-01-09    0.000000                        no
2017-01-10    0.000000                        no
2017-01-11    0.000000                        no
2017-01-12    0.333333                        no
2017-01-13    0.666667                        no
2017-01-14    1.000000  Makar Sankranti / Pongal
2017-01-15    0.666667                        no
2017-01-16    0.333333                        no
2017-01-17    0.000000                        no
2017-01-18    0.000000                        no
2017-01-19    0.000000                        no
2017-01-20    0.000000                        no
2017-01-21    0.000000                        no
2017-01-22    0.000000                        no
2017-01-23    0.000000                        no
2017-01-24    0.333333                        no
2017-01-25    0.666667                        no
2017-01-26    1.000000              Republic Day
2017-01-27    0.666667                        no
2017-01-28    0.333333                        no
2017-01-29    0.000000                        no
2017-01-30    0.000000                        no
2017-01-31    0.000000                        no
2017-02-01    0.000000                        no
2017-02-02    0.000000                        no
2017-02-03    0.000000                        no
2017-02-04    0.000000                        no
2017-02-05    0.000000                        no
2017-02-06    0.000000                        no
2017-02-07    0.000000                        no
2017-02-08    0.000000                        no
2017-02-09    0.000000                        no
2017-02-10    0.000000                        no
2017-02-11    0.000000                        no
2017-02-12    0.000000                        no
2017-02-13    0.000000                        no
2017-02-14    0.000000                        no
2017-02-15    0.000000                        no
2017-02-16    0.000000                        no
2017-02-17    0.000000                        no
2017-02-18    0.000000                        no
2017-02-19    0.000000                        no
descriptiontype
holiday-INIndicates whether the date is a IN holiday or not.continuous
holiday-IN-nameThe holiday name.categorical

Trend features #

code example

Features to model simple polynomial trend. Adds features of the form \(t,t^2,..\) . High degrees can cause overfitting, do not go above two unless needed.

from aix360ts.transformers import TrendFeatures
transformer = TrendFeatures(degree=3)
             sales
date
2017-01-01    21.0
2017-01-02    18.0
2017-01-03     9.0
2017-01-04    18.0
2017-01-05    15.0
...            ...
2019-12-27   923.0
2019-12-28  1194.0
2019-12-29  1341.0
2019-12-30   920.0
2019-12-31   748.0

[1095 rows x 1 columns]
            sales_trend_linear  sales_trend_quadratic  sales_trend_cubic
date
2017-01-01                   0                      0                  0
2017-01-02                   1                      1                  1
2017-01-03                   2                      4                  8
2017-01-04                   3                      9                 27
2017-01-05                   4                     16                 64
...                        ...                    ...                ...
2019-12-27                1090                1188100         1295029000
2019-12-28                1091                1190281         1298596571
2019-12-29                1092                1192464         1302170688
2019-12-30                1093                1194649         1305751357
2019-12-31                1094                1196836         1309338584

[1095 rows x 3 columns]
descriptiontype
sales_trend_linearFeature to model simple polynomial (of degree 1) trend in sales.continuous
sales_trend_quadraticFeature to model simple polynomial (of degree 2) trend in sales.continuous
sales_trend_cubicFeature to model simple polynomial (of degree 3) trend in sales.continuous

References #