Time Series Feature Engineering | Time Series Models in DataRobot #3

preview_player
Показать описание
Learn how to interpret and use the outputs of the DataRobot Time Series automation - specifically features, feature lists, and Leaderboard experiments. This is the third of six videos in the Getting Started with Time Series playlist.

Access the dataset used in the video:

Learn more

Content
In this video, we'll look at the results of the project automation, which is the key source of time savings for Time Series projects in DataRobot. Specifically, we'll look at the time series features and feature lists, and we'll look at how to organize and understand the experiments on the Leaderboard. We'll look at this first in the graphical user interface and then repeat most of those same tasks in code using a DataRobot notebook and the Python API. This is important because it's in the automation that the magic happens.

Let's begin by looking at features and feature lists. I'm on the Data tab and AutoPilot has finished running in this project. What you're seeing on screen is the Original Time Series Data. Now let me switch to the Derived Modeling Data. All of these derived features did not exist in the original data set and were created with the automation.

Let's try to understand what's here. I'm going to filter on the feature list named 'No Differencing' and search for our target variable which was the Sales_Volume. Here we see very standard time series operations such as creating lags of the target and also creating aggregations either for 11 months or six months. The variable names are generated automatically and are pretty easy to interpret. Also in the No Differencing list are all of the lags and aggregations for our other five features. Next we'll look at some of the feature lists that do use differencing. If this is unfamiliar to you consult the documentation.

Inside the list of feature lists you'll see several that do use differencing. There's differencing versus the latest value, the value 12 months ago, or the average baseline. All of these lists are created for you automatically.

Each one of these features has a feature lineage associated with it and can be viewed over time, or in relationship to the target. The exploratory data analysis, visualizations, and tracking of lineage are all automatic.

Now let's look at the machine learning models that have been created from all of these features and feature lists. We have 28 total experiments to review. Notice in this column the different feature lists that have been used. Each experiment represents the combination of a blueprint with a specific feature list.

From the Leaderboard, you can do a few different things. You can sort the list by a different optimization metric. You can rerun a particular experiment with a different feature list. You can sort all of the experiments that share the same blueprint.

Now let's do some of that same work in a Notebook. After AutoPilot completes, we can request all of the new features that have been created and inspect a few of them by name. We can also look at all of the new feature lists that have been created.

A common request is to replicate the Leaderboard as a data frame. In this cell we retrieve all of the models, and here we define exactly which model characteristics we'd like to have in a data frame. We loop through all of the models, retrieve those characteristics, then convert to pandas, sort, and reset the index. That gives us a data frame that looks very much like what we saw on the Leaderboard. Here's the model name. These are all of the processing steps it went through. Here's the feature list used, the error metric for backtest one, and for all backtests. This should look very familiar from the UI.

In this video we focused on understanding the outputs of the time series automation specifically features, feature lists, and Leaderboard experiments.

Request a custom demo

Stay connected with DataRobot!
Рекомендации по теме
Комментарии
Автор

Can we see similar videos for multi series time series project?

sks_DS