What is Data Cleansing? What's Involved?

preview_player
Показать описание
In this video we discuss the most under appreciated aspect of machine learning, data cleansing.

The link below is the data cleansing course. Just peruse it too what's involved.

Рекомендации по теме
Комментарии
Автор

Data Cleansing steps: -

Step 1: - "Attribute selection". Which relevant attributes / columns from the table would be beneficial to select for training the model.

Step 2: - "Handling Missing Values". Data in the real world is often dirty and incomplete. How would you handle those missing values is upto you.

Step 3: - "Imputing Missing Values". Imputation is the process of replacing missing data with substituted values. Values can be substituted by either taking out mean, median or mode.

Step 4: - "Noise removal". Noise is data which has no meaning. Data whose values are either faulty / nonsensical / outlier / corrupted. In this step the "noise" is either removed or is corrected prior to modelling.

Step 5: - "Numeric Transformation". All models only accept numeric models. Transforming "categorial data" into "numeric data" its the final step for modelling.

swarnimkhosla
Автор

If I use python sql connectors (like 'pymysql') and practice pulling data from a local database & then analyze it, will it be a wholesome practice for SQL & Data Analysis????

SIMONGREYMAN
Автор

I just finished your course "Performance Tuning Deep Learning Models Master Class", in section 2: optimal generalization techniques, you mentioned "input noise" as one technique to improve generalization and here you mention "Noise removal" as a Data Cleansing step, do I see a contradiction here?

DesertWolf
Автор

I'm charging 50 000 dollars as a salary of Data Custodian. No job I have. Staying with god.

luiscarlosgutierrezsosa