Time Series Feature Engineering | Time Series Models in DataRobot #3

Показать описание

Learn how to interpret and use the outputs of the DataRobot Time Series automation - specifically features, feature lists, and Leaderboard experiments. This is the third of six videos in the Getting Started with Time Series playlist.

Access the dataset used in the video:

Learn more

Content
In this video, we'll look at the results of the project automation, which is the key source of time savings for Time Series projects in DataRobot. Specifically, we'll look at the time series features and feature lists, and we'll look at how to organize and understand the experiments on the Leaderboard. We'll look at this first in the graphical user interface and then repeat most of those same tasks in code using a DataRobot notebook and the Python API. This is important because it's in the automation that the magic happens.

Let's begin by looking at features and feature lists. I'm on the Data tab and AutoPilot has finished running in this project. What you're seeing on screen is the Original Time Series Data. Now let me switch to the Derived Modeling Data. All of these derived features did not exist in the original data set and were created with the automation.

Let's try to understand what's here. I'm going to filter on the feature list named 'No Differencing' and search for our target variable which was the Sales_Volume. Here we see very standard time series operations such as creating lags of the target and also creating aggregations either for 11 months or six months. The variable names are generated automatically and are pretty easy to interpret. Also in the No Differencing list are all of the lags and aggregations for our other five features. Next we'll look at some of the feature lists that do use differencing. If this is unfamiliar to you consult the documentation.

Inside the list of feature lists you'll see several that do use differencing. There's differencing versus the latest value, the value 12 months ago, or the average baseline. All of these lists are created for you automatically.

Each one of these features has a feature lineage associated with it and can be viewed over time, or in relationship to the target. The exploratory data analysis, visualizations, and tracking of lineage are all automatic.

Now let's look at the machine learning models that have been created from all of these features and feature lists. We have 28 total experiments to review. Notice in this column the different feature lists that have been used. Each experiment represents the combination of a blueprint with a specific feature list.

From the Leaderboard, you can do a few different things. You can sort the list by a different optimization metric. You can rerun a particular experiment with a different feature list. You can sort all of the experiments that share the same blueprint.

Now let's do some of that same work in a Notebook. After AutoPilot completes, we can request all of the new features that have been created and inspect a few of them by name. We can also look at all of the new feature lists that have been created.

A common request is to replicate the Leaderboard as a data frame. In this cell we retrieve all of the models, and here we define exactly which model characteristics we'd like to have in a data frame. We loop through all of the models, retrieve those characteristics, then convert to pandas, sort, and reset the index. That gives us a data frame that looks very much like what we saw on the Leaderboard. Here's the model name. These are all of the processing steps it went through. Here's the feature list used, the error metric for backtest one, and for all backtests. This should look very familiar from the UI.

In this video we focused on understanding the outputs of the time series automation specifically features, feature lists, and Leaderboard experiments.

Request a custom demo

Stay connected with DataRobot!

DataRobot

Рекомендации по теме

Комментарии

Can we see similar videos for multi series time series project?

sks_DS

Time Series Feature Engineering | Time Series Models in DataRobot #3

Kishan Manani - Feature Engineering for Time Series Forecasting | PyData London 2022

Feature Engineering for Time Series Forecasting - Kishan Manani

Feature Engineering Secret From A Kaggle Grandmaster

Automated Feature Engineering of Time Series Data - Binary Classification

Time Series Forecasting with XGBoost - Use python and machine learning to predict energy consumption

Automated Feature Engineering with Large Scale Time Series Data with tsfresh & Dask | Arnab Bisw...

Automated Feature Engineering of Time Series Data - Forecasting

What is Time Series Analysis?

Feature Engineering Part 2 Observed and Unobserved variables || By Vikash Shakya

Modern Time Series Analysis | SciPy 2019 Tutorial | Aileen Nielsen

Automatic Feature Engineering on Large Scale Time Series Data using tsfresh & Dask Arnab Biswa...

Time Series Feature Engineering | Time Series Models in DataRobot #3

Automated Feature Engineering of Time Series Data - Multiclass Classification

Feature Engineering for Time Series Analysis in Spark

Felix Wick - ML Based Time Series Regression| PyData Global 2020

Time Series Data Preparation for Deep Learning (LSTM, RNN) models

PyCon.DE 2017 Nils Braun - Time series feature extraction with tsfresh - “get rich or die..

Automated feature extraction and selection for challenging time-series prediction problems

Live-Feature Engineering- Forecasting Time Series Using Facebook Fbprophet-Day 5

How to use Feature Engineering for Machine Learning, Equations

Sales Forecasting Machine Learning Project using Python | Feature Engineering | DataHour

Time Series Forecasting by Unit8's Krzysztof Styrc

Detrending and deseasonalizing data with fourier series

getML - Automated Feature Engineering on Relational Data and Time Series