Table joins are a common data operation performed by data analytics for joining two or more tables. This may result in an outburst of data which does not carry any significant information. The problem can be thought of as data explosion and can be addressed use code optimization techniques discussed in this blog.
The Cross Join
CROSSJOIN() allows you to join two or more tables by performing cartesian product of all the rows from all the tables used in the argument of the function. The syntax of CROSSJOIN() is presented below.
To see how CROSSJOIN() works:
- Click on the Modeling tab in the Ribbon.
- Click on New Table.
- Type the following expression in the formula bar and press Enter.
In this Expression:
- A new table containing the Cartesian product of Customer and Product table is calculated.
- Click on the Data tab in the left Ribbon.
- Click on Testing Cross Join (CT) in the Fields pan.
- Scroll down to the end of table.
CROSSJOIN() with simple tables returns 7.4M rows and it becomes an expensive operation to perform. We can use VALUES() function with Cross join to get all the possible combination of customer city and product subcategory.
For a complete step by step guide on this click on the link below:
https://powerbitraining.com.au/solving-data-explosion-resulting-from-crossjoin-in-dax/
Another way to solve the data explosion issue associated with CROSSJOIN() is to use it with SUMMARIZE() in an expression.
Let’s see how this works!
SUMMARIZE():
SUMMARIZE() allows you to create a summary of the input table grouped by a specified columns or table. The function’s output is dependent on the requirements which are specified as the input parameters in the function.
Lets use this concept to get some work done !.
In the last blog, we used CROSSJOIN() with VALUES() and calculated a combination of City and SubCategory. However, there were no sales displayed in the table.
To see how to use SUMMARIZE() with CROSSJOIN() and VALUES():
- Click on the Modeling tab in the Ribbon.
- Click on New Table.
- Type the following expression in the formula bar and press Enter.
In this Expression:
- A new virtual table containing the Cartesian product of distinct Customer cities and distinct Product Subcategories is calculated.
- This table is grouped by the product Subcategory and Sales Amount in a new virtual table.
- Click on the Data tab in the left Ribbon.
- Click on City and Category with No Sales (CT) in the Fields pan.
- Click on the Sales
- Click on “$” and type “2” in the formatting section.
- Click on the drop down in City column.
- Click on sort descending.
Notice that no sales for Socks, Bike Racks, Locks, Bib-Shorts and Tights occur in York,
Conclusion:
Data explosion is a common problem when dealing with table functions that perform the table joins. These functions are not memory efficient and therefore, the DAX code must be optimized to get useful information and ensure that the calculation is memory efficient. One of such functions is CROSSJOIN() which performs the cartesian product of the columns or tables used as arguments in the function. This results in a huge data set which is hardly of any value. CROSSJOIN() is used with other functions such as VALUES() and SUMMARIZE() to solve this issue.
Are you looking for an instructor-led Power BI or Power Apps training in Australia? We provide classroom as well as online live trainings for the following courses: