Professional Preprocessing with Pipelines in Python

Показать описание

In this video, we learn about preprocessing pipelines and how to professionally prepare data for machine learning.

◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚

🌐 Social Media & Contact 🌐

Рекомендации по теме

Комментарии

Rather than creating a class for each step, another much easier approach is to make use of sklearn's FunctionTransformer. This basically allows you to write a custom function and turn it into a transformer object, which can then be fed through a pipeline as per normal

vzinko

This was awesome and very informative. Many thanks from a machine learning novice!

isaacandrewdixon

Fantastic video, always wondered the reasoning behind using classes in ml, thank you!!!

dmitriidavs

Hey man, great channel! Love the topic based tutorials ❤️
Video Suggestion: Can I suggest you attempt making a video on: Using Python and the Tree Algorithm to make an autocomplete Python CLI program.

Haven’t seen this anywhere and I guess it’s a great way to understand why the Tree algorithm might be the best solution for an autocomplete program.
Thanks! Sure we all appreciate what you do for the community ♥️ 🌻

onecarry

For those who noticed that the encoder seems to sort the values alphabetically and messes up the job column names, instead of manually typing column names you can do:

matrix =
column_names = sorted([i for i in df['Job'].unique()])

This will also work if there are more /new jobs and values added and makes a column for each unique value while keeping the order.

Good tutorial in any case!

randomfinn

Nice. For this example I might use the ColumnTransformer class, its perfect for dropping columns and integrating imputers and scalers on select features.

nathanhaynes

This video is pure gold. Thank you so much!

Deacc

I remember when I took courses from udemy in ML and took more time from this video, keeps to continue creating more videos from the same subject.

niv_syt

wow this technique is amazing. thanks for sharing us with brilliant knowledge

manyes

Thank you so much nicely explained
with what you showed i created pipeline and dumped it as pikle file but when i tryinng to load that model and using it. i have been facing an error : AttributeError: Can't get attribute 'NullEncoder' on <module '__main__'>

nikulnayi

I find using FunctionTransformer much easier. It turns each of your custom functions into a transformer and you don't need to write a class, but just a function.

vlplbl

I would really like to find a tutorial on how to pass arguments to an pipeline function you created yourself, like the namedropper. So i can use the gridsearch to try out dropping different features.

jelcroospockt

What is the opening song of this videos name?

rohscx

I think your feature encoder has some faulty logic for the "Job" column. The df2 for example shows 1 x writer, 3 x programmer and 1 x teacher, but afterwards there isn't even a "teacher" column. And if you were to recreate the single columns using 1 or 0 from the features you created you wouldn't get the same dataframe.

__wouks__

With an eye towards the love that programming has gotten from the ml community lately, it occurs to me that perhaps ml could also be used more in the data preprocessing role.
For example: Choosing encoding types, handling missing values, flattening, etc could all be automated.
Just a thought.
2nd random thought. I know random noise has been added to features in an attempt to get the models to generalize better but did not fare well.
However I have not seen that anyone has tried simply using noise generators (normal, gaussian, etc) as individual features and allowing the model itself to choose when and where noise might be effective.

thomasgoodwin

16:42 I think it's wrong to use fit_transform in transform method, because it will cause to memory leakage, after you divide data into two parts train/test - where transform on the test dataset will recalculate imputer.

I have a big one question: What is the difference of build a Machine Learning application with Pipeline and to build a machine learning application with a OOP technique? I see that it is the same.

nachoeigu

Could you use the get_dummies pandas method for the One Hot Encoding?

MalcombBrown

Professional Preprocessing with Pipelines in Python

Professional Preprocessing with Pipelines in Python

Understanding Pipeline in Machine Learning with Scikit-learn (sklearn pipeline)

Implementing Machine Learninng Pipelines USsing Sklearn And Python

Using Scikit-Learn Pipelines for Data Preprocessing with Python

PYTHON SKLEARN PRE-PROCESSING + PIPELINE (22/30)

Have you used pipelines for Machine Learning before? #shorts

Using Pipeline for Preprocessing (Employee Termination Prediction) - Data Every Day #191

Scikit-Learn Model Pipeline Tutorial

Starting a career in Data Engineering? 5 things I wish I knew as a beginner

Use Pipeline to chain together multiple steps

Simplify Data Preprocessing with Python's Column Transformer: A Step-by-Step Guide

What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline (2024)

QSIPrep – Preprocessing pipeline for Diffusion MRI

The pipeline function

Preprocessing and Pipelines | Supervised Machine Learning with scikit-learn

Creating Pipelines Using SKlearn| Machine Learning

Linear Regression-Preprocessing with pipeline

Add feature selection to a Pipeline

Examine the intermediate steps in a Pipeline

Use ColumnTransformer to apply different preprocessing to different columns

What are Data Pipelines?

Spacy Preprocessing Pipeline

PyHEP 2020 ROOT preprocessing pipeline for machine learning with TensorFlow

Machine Learning Pipelines A-Z | Day 29 | 100 Days of Machine Learning