Combining multiple queries in Power Query Editor is a common activity for data preparation. Merge and Append queries are the two types of techniques used for combining queries. Although data scientists are familiar with the distinction between these two methods, the rest of Power BI users are not. In this blog post, a use case for each of these queries has been presented along with their demonstrated differences.
To master the skill of appending and merging the queries in Power BI, you need to know the basics of data transformation. So, let’s get started!
What is data transformation and why do we need it?
Data transformation is an essential topic of both Power BI basic and Power BI advanced trainings because data must be transformed to a format that is suitable for fast and precise data processing and efficient reporting. Data Transformation involves a wide range of operations such as identifying data sources and data types, cleaning data by removing errors and duplicates along with enriching data, and performing aggregations. There are various advantages to transforming data:
- Data is organized by transforming it. Users as well as systems find it easier to use transformed data.
- Null values, unexpected duplicates, wrong indexing, and mismatched formatting can all be avoided by using properly structured and verified data as it enhances data quality.
However, there are some difficulties in effectively transforming data:
- During the transition, a lack of competence or negligence might cause issues.
- Due to unfamiliarity with the range of valid and allowed formats, data analysts without proper subject matter experience are less likely to spot typos or inaccurate data.
- A company may convert data to a certain format for one application, only to restore it to its original format for another.
Here’s a blog to guide you through the basic data transformation principles: https://powerbitraining.com.au/using-group-by-in-power-bi-for-data-transformation/. The first half of the above-mentioned blog enlists the data transformation principles, and the second half performs basic data transformation using the Group-By method.
How can we transform our queries in Power BI?
Data transformation consists of a series of steps known as queries. A few common queries used in data transformation are:
Duplicate and reference queries have already been discussed in our blogs and now our focus is on appending and merging queries. To learn duplicate and reference queries visit:
Appending Queries is a feature in Power BI that allows you to create a stacked table out of similar data tables. Append means that the results of two or more queries, each of which is a table, will be integrated into a single query. It attaches one table to another in a top-down manner. Appending queries typically increases the row count of tables.
Let’s see this in action.
To completely follow the below steps, download the Append_Vs_Merge.pbix from the following link and open it in Power Query Editor:
A series of steps have been performed on the Sales table and the countries table Query. The steps are listed under the Applied Steps pane in the Query Settings:
Steps applied on Sales Table can be seen in the screenshot above.
Steps applied on Countries Table can be seen in the screenshot above.
To Append a Query:
- Click on Berlin in the Queries Pane.
- Click on Home Tab in the Ribbon Menu.
- Click on Append in the Combine
There are two primary ways of appending queries:
- Append Queries: this procedure adds rows to a table that already exists.
- Append Queries as New: this procedure will combine two (or more) appended tables to generate a new output table. This is the route we are going to take.
- Click on Append as New.
A pop-up appears.
- From the drop-down under the First table, select “Berlin”.
- From the drop-down under Table to append to the “Primary Table”, select “Chicago”.
- Click OK.
Berlin contained three columns whereas Chicago contained four columns. Three columns from Chicago table were the same as Berlin so the data in those columns was simply appended to the rows of Berlin. The Column Retailer Type did not exist in Table Berlin. Therefore, it contains null values where the RetCity is Berlin.
The infographic image above shows the mechanism of append query.
Merging Queries allows you to merge data sets that contain a common column. Merge query adds columns to the table. Merge queries work with dissimilar data sets linked by foreign and primary keys.
To Merge Queries:
- Click on Sales in the Queries Pane.
- Click on Home Tab in the Ribbon Menu.
- Click on Merge in the Combine section.
There are two primary ways of merging queries:
- Merge Queries – There will be no creation of a new table. The data will be compared against the query/table that already exists.
- Merge Queries as New – a new output table will be produced while it will try to find values from a separate table based on a common column between two tables. This is the route we are going to take.
- Click on Merge as New.
A pop-up menu appears.
- From the first drop-down menu, select Sales and click on RetCity
- From the second drop-down menu, select Countries and click on City
- Click OK.
A new column will appear with the header name Country Table.
The Sales and Countries Table contains the same number of rows. The merged table contains all the columns from the Sales table and the Countries Table.
The infographic image above shows the result of the merge query operation.
The merged query appears as a single table column. We can also expand the table and select the required columns as shown below.
To Expand the table:
- Click on the Merge1 Table Query on the left pane.
- Click on the expand icon on the Country Table column header.
A pop-up menu appears.
- Uncheck City from the list of columns that appear.
- Uncheck Use the original column name as a prefix to prevent the original table name from appending to the column name.
- Click OK.
Country names will appear in the column now.
Power BI provides you with six types of merging options. Here’s a blog about the different merging options available in Power BI: https://powerbitraining.com.au/merge-options-in-power-bi/ :
This is the default kind of Merge in Microsoft Power BI. It creates a merge of all rows from the first table and the corresponding rows from the second table.
This type of merger creates a merge of all rows from the second table and the corresponding rows from the first table.
Full Outer Merge creates a set resulting from all the rows from both tables. Null values are substituted in place of non-existing values.
The Inner merge only matches the rows which are present in both tables.
Right Anti finds the rows which are present only in the second table.
Left Anti finds the rows which are present only in the first table.
Append vs. Merge Query:
Append and merge perform different actions on the raw data. Append refers to the stacking of rows of the data. The kind of tables used in the append query are slightly similar if not exactly similar. This means append query increases the row count of structurally similar data. Merge query on the other hand is used to join tables with a common column. The name of the column can be different but the column must hold similar values to define the merge. Merge query, therefore, increases the column count.
When you want to incorporate more rows of data into an existing query, you append it.
You merge the table if you have one or more fields that you would like to insert into another query.
When two tables have the same number of columns with the same name, append will simply match these columns and place their rows beneath each other. Whenever the columns are not identical, the append query will populate the values with null.
Between two queries, there should be joining or matching criteria. (For example, the RetCity Column of Sales Table and City Column of Countries Table were exactly matched due to similar values).
Duplicates would not be removed using Append Queries. To get rid of duplicates, use either Group By or Remove Duplicate Rows.
Ideally merging would not result in duplicates. But when we manually enter data or import data in bulk, duplicate rows can appear.
In SQL, the Append function is identical to UNION ALL.
In SQL, the Merge function is equivalent to JOIN.
Data transformation refers to the process of performing appropriate actions to convert raw data into useful information. The transformed data makes data analysis easier and uncovers useful insights. Data transformation consists of a series of steps known as queries. Understanding the difference between Power BI’s Append and Merge operations is essential for any aspiring data analyst. In this blog post, the append and merge queries have been thoroughly discussed.
Download Power BI desktop files for this blog from the link given below:
Are you a data analyst and want to learn more about Power BI? Why not sign up for Power BI training in Australia. We provide our services in the following regions