Dataset,Training Data,Test Data,Data Normalization | Program3 IIIBscCS| M.S University | Web Academy

preview_player
Показать описание
Scikit-learn is a machine learning package in python.
Scikit-learn, also known as sklearn, was part of the Google Summer of Code (GSoC) project.
It was first developed by David Cournapeau in 2007 and publicly released in 2010.

What is Dataset?
Dataset is a collection of various types of data stored in a digital format.
For ML project we need the quality dataset. Without dataset, machine is
not learning properly. So that ,Data is the key component of any
Machine Learning project.
Machine learning(ML) uses algorithms to learn from data in datasets.
Datasets are classified as structured and unstructured datasets,
where the structured datasets are in tabular format in which the row of
the dataset corresponds to record and column corresponds to the features,
and unstructured datasets corresponds to the images, text, speech, audio, etc.

What is Training Data?
In machine learning, training data is the main and most important
data which helps machines to learn and make the predictions. Good
training data is the backbone of machine learning.
The general ratios of splitting train and test datasets are 80:20, 70:30,
or 90:10.

What is Test Data?
Test dataset is used to evaluate the performance(accuracy) of the model and
ensures that the model can generalize well with the new or unseen dataset.

Once your machine learning model is built (with your training data),
you need unseen data to test your model. This data is called testing data
and you can use it to evaluate the performance and progress of
your algorithms' training and adjust or optimize it for improved results. 

What is Data Normalization?
Normalization refers to rescaling real-valued numeric attributes into a 0 to 1 range. Normalization is the process of organizing data in a proper manner. It is used to minimize the duplication of various relationships in the database. After normalization, all variables have a similar weightage on the model, hence improving the stability and performance of the learning algorithm. It takes an array in as an input and normalizes its values between 0 and 1.

Data normalization transforms the multiscaled data all to the same scale.
After normalization, all variables have a similar weightage on the model, hence improving the stability and performance of the learning algorithm.

Program 3

from sklearn import preprocessing
import pandas as pd
import numpy as np
print("Original Data:",x_array)
print("Normalized Data:",normalized_arr)
Рекомендации по теме
join shbcf.ru