Clean up similar text in Excel with Power Query fuzzy clustering

preview_player
Показать описание
This brilliant feature allows you to clean up any column with similar text, such as misspelt names, different ways of saying it. It uses Power Query but this video can be used by even Power Query novices. It works in either Excel or Power BI using the brilliant Table.AddFuzzyClusterColumn.
Ideally one should use data validation but if not, this method could save hours of data clean up work. I also show how it can be used in dataflows.

I show how to use the code in a table, see similarity % for each match and set the minimum level, override some matches, even have a transformation table to say for example that England should change to the UK.

00:00 - Introduction
01:13 - Get data into Power Query
02:46 - Fuzzy cluster column function
05:44 - Similarity thresholds, case, spacing
09:07 - Transformation table & manual adj
11:22 - Troubleshooting issues
12:36 - Add fuzzy match to existing table
14:44 - Power Query Online UI
15:28 - Fuzzy grouping
Рекомендации по теме
Комментарии
Автор

You explain everything brilliantly, may I ask 1 stupid question? I have my column of names that everybody typed in the wrong names so I have 3 versions of a name, lets say Prima Meat Group (PTY) Ltd is the correct one, then it has been typed in, Prima meat group, Prime meat . I have a long list of these, next to your original list you have a column on the side that has a description of the the true, false, CorrEct. How must I get it there without it taking up an entire day. I my list there is at least 1 of every name that is correct. How do I go about to find it? Oh and PS. I follow your tutorial with absolute ease. I appreciate that this came along my path.🥳

debrapretorius
Автор

Every once in a while I stumble over a Youtube channel that deserves WAYYYY more subscribers. This is one such occasion. Very useful content!

SebtorDude
Автор

You sir are a certified 100% top class genius - can't find more adjectives but wow.

DhruvDua
Автор

Thank you very much, I have a problem, let's say I want to match software names, some time you say MS Excel, Microsft excel and excel. now maybe this is ok with MS Excel and Excel, but not ok when Microsoft is added. Please do you have any idea how I can go around this without a transformation table?

UE
Автор

Hi! Very good video! However, not exactly matching my need. I have 1000s of rows already matched on a customer number level. I now want a similarity score for address in system A vs. address in system B. Do you know if there are similar functions to do something like this? I could transform the rows to columns but that would try to match every single value to all, also to those not from the same customer.

JustOkko
Автор

I would like a copy of your file please. Thank you

memphisblues
Автор

Can this be used on partial strings? e.g. look for Company Ltd or Co, Ltd or Co., Ltd at the end of a business name and change to Co. Ltd

styx_n_stonz
Автор

Is there some way to say "This" never should match with "That". Its sort of an opposite transformation table eg I have the word Blender and it keeps matching to Bender - both words may occur and are fine as words and I don't want to increase my similarity percent to stop the match as that cuts out other results which are ok. I just don't want it ever to thinking these two words are a match.

chrisaldridge
Автор

Thanks for a very useful video! Is there possible to check the similarities between two exisiting columns using the same/similar function?

carralegend
Автор

Do you need to do those orginal, mispelled charts always?

AlvaroPerez-oxul
Автор

Very useful thank! However I am facing an issue with the transformation table, it doesn't seem to do anything for me, any idea why?

manalehayek
Автор

When I type =Table... in blank query: nothing appear, do I need to install some add in? My version is 2016 profesional plus

deure
Автор

Can you share source file on which you are working in video

hussainmeghani
Автор

What a useful little secret!! I'm just thinking about fuzzy stuff generally. Do you think it works with nonsense strings?

leebecker
Автор

hi
can I have a copy of the file please thanks

fadilyassin