filmov
tv
How to collect data for Analytics
Показать описание
How to collect data for Analytics ,
This podcast is for someone who is new to analytics. Please listen to the end you should learn something new.
After defining the variables, you want to study. The next step is data collection.
Data collection is an extremely critical task.
If your data is flawed by biases, ambiguities,
or other types of errors, the result you will get from using such data will be suspected or error.
Data collection consists of identifying data sources,
deciding whether the data you collect will be from a population or a sample.
The next step is to check whether it is clean or not.
Sometimes we need to check the recording of variables.
Data Sources
Generally, we collect data from primary or secondary data sources.
Primary Data source: If you collect your own data for analysis.
Secondary data sources: if the data for your analysis have been collected by someone else.
You collected the data by using any of the following:
1. Data distributed by an organization or individual.
2. The outcomes of a designed experiment
3. The responses from a survey
4. The results of conducting an observational study
5. Data collected by ongoing business activities
Etc..
Populations and Samples
Your collected data either a population or a sample.
Population: A population consists of all the items or individuals about which you want to reach conclusions.
Like all customer data of a company.
Sample A sample is a portion of a population selected for analysis. The results of analyzing a sample are used to estimate the characteristics of the entire population.
Say for example we have taken random only 1000 customers sample for analysis.
Or audit team has audited random 100 transactions etc.
Structured versus unstructured data:
Say for example you ran one survey you got data like tick mark bad by this data you want to perform Logistic regression. So, for logistic regression, this data would be unstructured data.
But when you convert this data as a binary format than this data would become structured data.
Data cleaning
Whatever way you collect data, you may find irregularities in the values you collect.
Such as undefine or impossible values.
Outliers, Missing values, etc.
Outliers: values that seem excessively different from the rest of the values. That value may or may not be errors, but they demand a second review.
Say for example if you are analyzing data of the wealth of each individual of Mumbai, In that data, Mr. Mukesh Ambani’s wealth may be outliers.
Missing value: is a value that was not able to be collected and it is not available in the database.
This podcast is for someone who is new to analytics. Please listen to the end you should learn something new.
After defining the variables, you want to study. The next step is data collection.
Data collection is an extremely critical task.
If your data is flawed by biases, ambiguities,
or other types of errors, the result you will get from using such data will be suspected or error.
Data collection consists of identifying data sources,
deciding whether the data you collect will be from a population or a sample.
The next step is to check whether it is clean or not.
Sometimes we need to check the recording of variables.
Data Sources
Generally, we collect data from primary or secondary data sources.
Primary Data source: If you collect your own data for analysis.
Secondary data sources: if the data for your analysis have been collected by someone else.
You collected the data by using any of the following:
1. Data distributed by an organization or individual.
2. The outcomes of a designed experiment
3. The responses from a survey
4. The results of conducting an observational study
5. Data collected by ongoing business activities
Etc..
Populations and Samples
Your collected data either a population or a sample.
Population: A population consists of all the items or individuals about which you want to reach conclusions.
Like all customer data of a company.
Sample A sample is a portion of a population selected for analysis. The results of analyzing a sample are used to estimate the characteristics of the entire population.
Say for example we have taken random only 1000 customers sample for analysis.
Or audit team has audited random 100 transactions etc.
Structured versus unstructured data:
Say for example you ran one survey you got data like tick mark bad by this data you want to perform Logistic regression. So, for logistic regression, this data would be unstructured data.
But when you convert this data as a binary format than this data would become structured data.
Data cleaning
Whatever way you collect data, you may find irregularities in the values you collect.
Such as undefine or impossible values.
Outliers, Missing values, etc.
Outliers: values that seem excessively different from the rest of the values. That value may or may not be errors, but they demand a second review.
Say for example if you are analyzing data of the wealth of each individual of Mumbai, In that data, Mr. Mukesh Ambani’s wealth may be outliers.
Missing value: is a value that was not able to be collected and it is not available in the database.