DAX is increasingly gaining popularity as the most interesting topic in Power BI training because DAX solves business problems like no other language. DAX code must be technically optimized to produce desired results. In this blog post, a brief introduction to the vertipaq engine responsible for the compression of data is presented, followed by guidelines on writing technically sound code in DAX.
Vertipaq engine in DAX
Vertipaq is the compression engine of Microsoft Power BI. As soon as the data is loaded in Power BI, Vertipaq engine performs a series of algorithms on the data to compress it as much as possible. This optimizes the memory footprint and DAX query time in Power BI.
There are three algorithms used by vertipaq engine to compress the data:
Value encoding discovers the mathematical relationship between different values in a column and uses this relationship for compression of data. The data is encoded in the first stage and then stored. In the second stage, the data is decoded, and original values are returned. Value encoding is performed only for integers and not for strings and floating-point values.
Dictionary encoding builds a dictionary of the distinct values of a column and then it replaces the column values with indexes of the dictionary. Dictionary encoding therefore converts column values into integers only. This takes lesser memory, and the workflow is optimized.
Run length encoding optimizes the size of the data set by avoiding repetitive values. RLE creates a relatively complex code structure containing the value only once and the number of contiguous rows containing that value.
The model compression depends on various factors.
- Cardinality of a column. It is the number of distinct values define the number of bits used for storage.
- The number of rows in the table.
- The data type of column. It is related to the dictionary size.
- The number of repetitions. This directly determines the distribution of the column.
Let’s look at some technical aspects of writing DAX to optimize the vertipaq compression:
- Either create measures in the most appropriate table or have separate measures-only tables.
- Don’t use bi-directional filters until its necessary.
- Avoid context transition for large iterators.
- Reduce nested calls.
- Use comments to document your code.
- Do not filter whole table, filter selected columns.
- Break the measure code into interim parts.
- Do not break the column code into interim parts.
- Do not hide errors unless you are sure how to deal with the aftermaths.
- Avoid double evaluation of same expression by using variables.
- Avoid using spaces in table names, otherwise you will have to add single quotes to reference that table.
- Do use spaces in column names, it increases readability.
- Do use spaces in measure names, it increases readability.
- Calculated Columns can be expensive (they update when the data is refreshed).
- Measures contain an implicit CALCULATE().
- Avoid using the FILTER() function unnecessarily and avoid using it on your fact table, as it can perform poorly on large tables.
- A standard filter expression in a CALCULATE() function performs better.
DAX is the formula language integrated in Microsoft’s business intelligence tools to enhance data enrichment capabilities of the tool. Writing optimized DAX expression ensures that the vartipaq engine compresses data better and the operational efficiency of the report is enhanced. In this blog, a brief introduction to the vertipaq engine and technical aspects of writing code in DAX is presented.
Learning Power BI from Scratch? Get a complete, in-depth training on Microsoft Power BI by attending our training courses.