Tutorial 5-End To End ML Project-Data Transformation Implementation Using Pipelines

preview_player
Показать описание
In this videos we will be implementing the data transformation where we will be performing necessary tasks such as handling categorical values,handling missing values standard scaling using pipelines and saving the pickle in the artifact folder.

Join iNeuron's Data Science Masters Course with Job Guaranteed Starting From April 3rd 2023

Timelines
---------------------------------------------------------------------
00:00:01 Agenda
00:01:44 Import necessary libraries
00:05:06 Data Transformation Config
00:08:16 Create Data Transformer using Pipeline
00:18:10 Initiate Data Transformation
00:27:33 Test Data Transformation and Ingestion

Join this channel membership to get access to materials:

check out the end to end project playlist

Check Out My Other Playlist
Python Playlist:

Stats Playlist:

Complete Deep Learning Playlist:
Рекомендации по теме
Комментарии
Автор

Join this channel membership to get access to materials:
check out the complete end to end project playlist

krishnaik
Автор

Thank you Krish for your helpful and valuable projects.
As answer to your question, and based on what you told us last session. We use @dataclass decorator, because inside any traditional class, to define the class variables you basically use __init__, but if we use this @dataclass decorator, it enables us to define the class variable directly.

_Ahmed_O
Автор

ALERT: replace all StandardScaler() with
The reason that adding with_mean=False resolved my error is that the StandardScaler is subtracting the mean from each feature, which can result in some features having negative values. However, StandardScaler() alone assumes that the features have positive values, which can cause issues when working with features that have negative values.
By setting with_mean=False, the StandardScaler does not subtract the mean from each feature, and instead scales the features based on their variance. This can help preserve the positive values of the features and avoid issues.

DJ-jfqg
Автор

This playlist is so Amazing and explanation clear and to the point.... I never tried this way of implementing END to END ML project .. Thank you Krish Naik... From now on I will follow these steps to complete my ML/DL project.

akshaypaunikar
Автор

Krish sir, these are some amazing efforts by you. I am a student of iNueron FSDS May 2022 batch. There the project was taken by another tutor so I am revising here. And you took explaining to next level, so easy to understand.

And the answer to question you asked about dataclass is that, with the use of dataclass, one can directly define variables and their data types in a class.

NiyatiVyas-yuuz
Автор

Great Effort. Thanks for helping others. Stay happy and be blessed!!!

kumaronlineplay
Автор

StandardScaler is used to bring all the values in a range .. and OneHotEncoder is used for forming new column for categorical vaules

hm
Автор

Thanks for this amazing content, I am learning a lot of things. Also, can you check at 21:44 time of this video, I feel some part of the video is missing. Sorry, you have already covered the missing part in next video.

akash_thing
Автор

By using the @dataclass without using the constructor __init__(), the class (DataTransformationConfig ) accepted the value and assigned to the given variable, so that in this case automatically the 'preprocessor.pkl' file will be created in the 'artifacts' folder... Thank you <3

uditdas
Автор

In love with this playlist. A guide which can be referred always. Thank you Krish Sir.

im_tanmay_g
Автор

Thanks a lot Krish for your efforts 🙏 and take a bow, not enough words to convey how useful this is, explaining from scratch how to build a end to end project.

cpsriram
Автор

WARNING : For Categorical Pipeline avoid StandardScaler() . You should not use it because the values are discrete and represents a category or maybe a Binary value when we do oneHotEncoding.

Standard Scaler (Also known as Z-Score Normalization), tries to normalize the values acc the formula - [ (x - mean)/sd ] which will transform the values as a non-discrete variables.

The aim of Standard Scaler is to compact the data, so that, the Optimization Algorithms runs Faster and we have faster training time.

Learned from Andrew NG's ML Specialization in Coursera.

cadc-pnir
Автор

Great playlist sir, and very beginner friendly

thekarthikbharadwaj
Автор

Thanks a lot for your videos. You’ve taught me more than my master’s sir

ShadoWalker
Автор

its an awesome i never ever seen in total you tube this project @Krish sir i am big fan of your videos

nageshdigi
Автор

am I the only one who noticed that the data transformation last part was skipped and was jumped to saving data-transformer-object

DhurFitayMu
Автор

Excellent content . Please continue. It’s very helpful .

ishanarya
Автор

thank you so much for sharing this valuable content with us

MMEELL
Автор

Onehotencoding for converting categorical variables similar to dummy variables in pandas.

Standard scalar for scaling the numerical variables is a similar scale, another scalar is the minmax scaler.

datawithdami
Автор

@krish Naik: I am your student in FSDS2.0. I am also following you for 2 years. Many thing are making sense to me now even though the ML part is yet to start in FSDS2.0

Nirav