Match by Combining Parts in Fuzzy Matching

You are currently viewing Match by Combining Parts in Fuzzy Matching

During our last session of power bi training in Sydney, we received an overwhelming response by business users who wanted to implement the Power BI training course for their official tasks. Apart from analytics, the attendees found Fuzzy Matching algorithm as a useful tool for spelling correction and categorical data validation. Fuzzy Matching algorithm allows merging data sets that are slightly similar and not exactly same. This blog is in continuation with our series of blogs related to the fuzzy matching algorithm.

  1. Using Fuzzy Matching To Merge Queries And Find Data Similarity In Power BI
  2. Changing Similarity Threshold In Fuzzy Matching
  3. Ignore Case Option In Fuzzy Matching

Fuzzy matching has various applications such as spelling correction, data similarity checking and data cleaning.

Power BI allows you to work with different fuzzy merge options. These options allow you to set the criteria for the matching that the algorithm performs.

In this blog post, we will see how This feature in Power BI allows you to merge queries that contains values in the form of bi-grams. For this we have two tables, one containing the clean data and other contains the unclean data. This blog post looks closely into the option of matching by combining parts. Fuzzy merge comes in with other options which can be tuned.

For this blog post we have two tables, one containing the clean data and the other containing the unclean data.

This table contains a single column with two values: Sydney and Melbourne. These are the target values.

Now let’s see the un-clean data.   

This table contains a City column containing many incorrect variations for Sydney and Melbourne and correct matching counterparts for Perth and Adelaide. Each row is assigned an ID to track its matching result in the ID column.

A bi-gram contains two parts for a single word.

As an example let’s look at the second row with ID:2, Sydney was entered as “Sy dney” which has a space which divides the word into two parts.  

To work with different Fuzzy Merge options:

  1. In the Queries pane, click on Cities_CleanData.
  2. Click on Home > Merge Queries > Merge Queries as New.
  3. Select Cities_CleanData in the first drop-down menu and click on City Column.
  4. Select Cities_UncleanData in the second drop-down menu and click on City Column.

5. Select the check box to Use fuzzy matching to perform merge.

6.Uncheck the box to the left of Ignore Case. Click OK.

Make sure that the Match by Combining text parts is checked.

7.Click on The double headed arrow next to Cities_UncleanData. Click OK.

This is what our merged query looks like:

Notice that only 9 rows have been returned. The match for “sy dney” has been identified correctly as Sydney.

Let us look at the options Power BI provides you to tune your matching algorithm.

Similarity Threshold

This option allows you to choose the Range of unclean data that can be considered as true match.

Ignore Case

This option is used for String Matching where case sensitivity is not required. Using this option sydney and SYDNEY will be considered as similar.

Match by combining Text Parts

Using this option, both the parts of the word are considered while algorithm looks for a match. Using this option, Sydney and Si dney will be considered similar.

Maximum number of Matches

This option allows you to limit the Number of Values that your algorithm returns. The value entered in this option returns results with the first result denoting the highest match and so on. If no value is entered the algorithm returns all rows in the same order.

Transformation Table

Transformation tables act as Mapping Tables for the algorithm when the dataset under consideration is highly dissimilar.

Conclusion:

Power BI’s query editor provides the option for merging queries using the fuzzy matching algorithm. This algorithm merges data sets that are slightly similar. Microsoft Power BI also provides its users with different options to tune the matching criteria.  In this blogpost, the option of matching by combing parts is discussed. This option is useful for data sets that contain unwanted spaces. By checking this option during the merge settings, power bi query editor maps the bi-grams to the original words. Thereby, making data set clean.

Are you a data analyst and want to learn more about Power BI?

Why not sign up for Power BI training in Australia We provide our services in the following regions:

Power BI Training in Sydney

Power BI Training in Melbourne

Power BI Training in Brisbane

Leave a Reply