Using Transformation table in Fuzzy Matching in Power BI

You are currently viewing Using Transformation table in Fuzzy Matching in Power BI

Fuzzy matching is one of the core topics covered in our Power BI Training in Sydney and Melbourne. This algorithm has found its way to the advanced level course because of its applicability. Fuzzy matching can be used for data cleaning, spelling correction and data mapping. Fuzzy Matching is the technique of finding strings that match a pattern approximately (rather than exactly) and allows to create links between data sets that may be less than 100% perfect.  

How is fuzzy Merge different from simple merge?

Simple Merge allows the merging of data sets that have a common key column. This means that the data value in primary key columns should exactly match with a data value in foreign key column.  Fuzzy Merge allows to merge data sets that are Slightly Similar. It uses fuzzy matching algorithm to find the similarity between the selected columns.

Using Fuzzy Matching To Merge Queries And Find Data Similarity In Power BI

Microsoft Power BI provides you with different options to tune your algorithm. Fuzzy Matching can be performed in Power BI by either the Merge Button in the GUI of query editor or the power query functions. 

Let us look at the options Power BI provides you to tune your matching algorithm.

In this blog post, we will walk through the last option: using the Transformation Table.

What is a transformation Table?

Transformation tables consist of a ‘To’ and a ‘From’ field which define how the matching is performed between highly dissimilar data sets. Transformation tables are used to define mapping rules. Transformation tables are designed to bridge the gap between datasets which cannot be matched using simple fuzzy matching. They are broadly used as abbreviation tables.

These tables need to be designed. They act as a template for fuzzy matching algorithm. The algorithm performs self-learning on the table and produces similar results.

Transformation tables are suited for datasets that have been built without any naming rules or their predicted inputs have not been defined.

This blog post assumes that a simple fuzzy merge has been performed and now the now the other fuzzy merge options need to be tweaked.

In this blog post, we will see how This feature in Power BI allows you to merge queries. For this we have two tables, one containing the clean data and other contains the unclean data. This blog post close looks into the Transformation table option. Fuzzy merge comes in with other options which can be tuned.

For this we have two tables, one containing the clean data and other contains the unclean data. 

This table contains a single column with two values: Sydney and Melbourne. These are the target values.

Now let’s see the un-clean data.   

This table contains a City column containing many incorrect variations for the two cities (Sydney and Melbourne) and correct entries for Perth and Adelaide. Each variation is assigned an ID to track its matching result in the ID column.

To work with Transformation Tables in Fuzzy Merge options:

1. In the Home Tab, Click on Enter Data in the External Data

2. Name the first column as To.

3. Click on * at the Right of To.

4. Rename the new column as From

5. In the To column, enter Sydney.

6. In the From column, Enter Syd.

7. Click Load.

Notice that a new Table is created in the Fields Pane.

8. Open the Query Editor

9. In the Queries Pane, click on Cities_CleanData.  

10. Click on Home > Merge Queries > Merge Queries as New. 

11. Select the check box to Use fuzzy matching to perform merge

12. Select City column from drop-down menu for both tables. 

13. Click on Fuzzy Merge Options

14. Select Table from the Transformation Table (Optional) drop-down

 Click OK.  

15. Click on the double headed arrow next to Cities_UncleanData. Click OK.

Notice that Syd is added to the results.

Exploring Fuzzy Merge options in Power BI

Power BI allows you to work with different fuzzy merge options. These options allow you to set the criteria for the matching that the algorithm performs.  

Let us look at the options Power BI provides you to tune your matching algorithm.

Similarity Threshold

This option allows you to choose the Range of unclean data that can be considered as true match.

Ignore Case

This option is used for String Matching where case sensitivity is not required. Using this option sydney and SYDNEY will be considered as similar.

Match by combining Text Parts

Using this option, both the parts of the word are considered while algorithm looks for a match. Using this option, Sydney and Si dney will be considered similar.

Maximum number of Matches

This option allows you to limit the Number of Values that your algorithm returns. The value entered in this option returns results with the first result denoting the highest match and so on. If no value is entered the algorithm returns all rows in the same order.

Transformation Table

Transformation tables act as Mapping Tables for the algorithm when the dataset under consideration is highly dissimilar.

Conclusion:

As the data records evolve over time, some naming conventions might become obsolete and new forms of representing the data are introduced. This creates a gap. To bridge this gap, a transformation table can be introduced which maps the newer data formats to the old ones. It acts like a template for the algorithm to match all valid entries. In this blog post, the option of using transformation table in fuzzy merge algorithm has been explored.

Are you a data analyst and want to learn more about Power BI?

Why not sign up for Power BI training in Australia.  We provide our services in the following regions:

 

This Post Has One Comment

  1. Justus

    For the “Similarity Threshold”, do you know which string distance is used, if any?

Leave a Reply