Clustering Techniques Similarity and Dissimilarity Measures

preview_player
Показать описание
In this video you will learn Clustering Techniques and Similarity and Dissimilarity Measures for it.
What is supervised Learning?
What is Unsupervised Learning?
What is Similarity?
What is Classification?
What is Clustering?
What are Similarity and Dissimilarity Measures?
What are Distance Measures in Big Data Analytics?

Similarity measure:
- is a numerical measure of how alike two data objects are.
- higher when objects are more alike.
- often falls in the range [0,1]
- Similarity might be used to identify
- duplicate data that may have differences due to typos.
- equivalent instances from different data sets. E.g. names and/or addresses that are the same but have misspellings.
- groups of data that are very close (clusters)

Dissimilarity measure:
- is a numerical measure of how different two data objects are
- lower when objects are more alike
- minimum dissimilarity is often 0 while the upper limit varies depending on how much variation can be
- Dissimilarity might be used to identify
- outliers
- interesting exceptions, e.g. credit card fraud
- boundaries to clusters
Proximity refers to either a similarity or dissimilarity.
Proximity Measures for Single Attribute.
Proximity Calculation
Similarity Measure with Symmetric Binary
Symmetric Binary Coefficient
Similarity Measure with Asymmetric Binary
Jaccard Coefficient
Proximity measure with internal scale
Properties of Distance Metrics: 1. Non-negativity, 2. Symmetry, 3. Transitivity
Manhanttan distance (L1 Norm), Taxical metric, city-block metric.
Hamming Distance
Euclidean Distance (L2 Norm)
Chebychev Distance (L infinity Norm)

🙏🙏🙏🙏🙏🙏🙏🙏
YOU JUST NEED TO DO
3 THINGS to support my channel
LIKE
SHARE
&
SUBSCRIBE
TO MY YOUTUBE CHANNEL
🙏🙏🙏🙏🙏🙏🙏🙏

Keep Smiling and Keep Learning......
Рекомендации по теме