Multivariate Time Series Data Preprocessing with Pandas in Python | Machine Learning Tutorial

preview_player
Показать описание

Learn how to prepare data for Time Series forecasting. We'll convert minute-by-minute Bitcoin trading data (stored in CSV file) into sequences. We'll scale the data and split it into training and test sets.

#TimeSeries #LSTM #PyTorch #Python #Transformer
Рекомендации по теме
Комментарии
Автор

This video is a gold mine for multivariate time series data. After searching for hours online, you were the ONLY person that was capable of explaining everything in a simple way.

Thank you!

MaximeAntoine
Автор

This is by far one of the best videos I have seen about data preprocessing for Time Series Data. Keep up the good work please!

Asparuh.Emilov
Автор

8:32 - iterating over rows in pandas is usually much slower than doing a column-wise operation.
Instead of this:
df["close_change"] = df.progress_apply(
lambda row: 0 if np.isnan(row.prev_close) else row.close - row.prev_close,
axis = 'columns'
)

Try this:

df["close_change"] = df['close'] - df['prev_close']
df["close_change"].fillna(0, inplace=True)

AlistairWalsh
Автор

great job Venelin!!...waiting for a video on fine-tuning Transformer based recommender :)

Deepakkumar-sntr
Автор

Thank You for this wonderful video showing casing PyTorch for LSTM Time Series

paulntalo
Автор

probably the best video on time series.

ephi
Автор

Love the sweet song you shared in your notebook. Been vibin to Common's music while going through the code. Great stuff thanks for sharing!

MayssaRekik
Автор

Great content! Thanks for your efforts!

antonbozhinov
Автор

Thanks for the video Venelin. It is really good for learning the coding side of things. To those who wants to do real life projects, I suggest not to apply the same features with same way of scaling. I might be wrong but I don't think it is a good idea to scale days of week ( 0-6 range ) or months etc.. with MinMaxscale( -1, 1) . They are not numerical features like the price or volume. they are categorical data if I am not wrong and scaling them the way they are done will confuse the algorithms.

mehmetnaml
Автор

The scaler part is huge weakness in the model; by using a minmax scaler you are assuming that the historical ATH (all time high) price will never be reached which is a fundamental mistake as (asset) prices are continuous. Therefore, the model will not likely be able to predict a resistance.

kadourkadouri
Автор

This is a very high quality videos, Thanks!!
Have you done any anomaly detection on a multi variate time series?

mohammadfadel
Автор

thank you for this great video. very helpful

alteshaus
Автор

Great tutorial! Thanks! One comment in the preprocessing step. Iterating over each row to create a dictionary and appending those dictionaries to a list is much much slower than copying the dataframe and creating the columns you need like so:

features_df = df.copy()
features_df['day_of_week'] =
features_df['day_of_month'] = features_df['date'].dt.day
features_df['week_of_year'] = features_df['date'].dt.week
features_df['month'] = features_df['date'].dt.month

gregjuva
Автор

Hey man you are Aweomse, thank you so much for your easy and understandable video, this is the best of the best, thank you so much 👍👍👍👍👍👍👍👍👍👍

marlonlopezpereyra
Автор

Venelin, hey! Very nice video! Can you comment why you picked range from -1 to 1 for scaling?

gju
Автор

Thank you for very informative video, may I ask you why we need to transfer our Pandas data frame to sequence?

Rody
Автор

Great video! What is the meaning behind creating the sequences?

mp
Автор

thank you for your kindness it's nice Vedio

piramid
Автор

Thank you very much for this vidéo I have a qst ; please, how to prepare our data, in the case of a multivariate analysis but with redundant dates, for example if the variable Symbol have different values(BTC, ETH, ? (so we don't have a unique key )

Wissam-rktv
Автор

20:14 is where we write the create_sequences function

SaudBako