In our last Power BI training in Sydney, we spent a lot of time answering questions around the best data modeling schema for creating reports in Power BI. Schema refers to the logical arrangement of tables used for creating reports in Power BI desktop. In this blog post, a brief introduction to star and snowflake schemas is presented.
Before we dig deeper into the different schemas, lets first understand what a data model is:
What is Data Model
Data model refers to the abstract model that demonstrates the logical structure of data and the relationships that exist in the data. In Microsoft Power BI, data model refers to everything that is loaded from query. In terms of data, a good data model consists of the right number of tables with the right relationship between them. Let’s look at some basics before digging into the data modeling concept.
A data model consists of Two or more tables related to each other. For example, let’s consider the data model in the image. The sales table in the center is related to each of the other tables, i.e., Product, Sales Order, Reseller, Customer, Sales Territory and Date.
Schemas in Data Model
The logical structure of the tables that exist in a data model is known as the schema. A good schema follows the following rules:
- No looping exists between three or more tables of a data model.
- Two tables are connected via one active relationship only.
Let’s consider the two types of schemas:
In a star schema, a fact table is surrounded by multiple dimension tables. Power BI engine works best with star schema.
For example, the sales table exists on the many side of the relationship and all the dimension tables exist on the 1 side of the relationship as shown below.
The star schema does not necessarily have to be in the shape of a star. Below is also a star schema and is a preferred arrangement for many experts as in this arrangement is easier to visualize filter propagation from dimension to fact tables.
The star schema does not necessarily need 5-dimension tables to complete the star. Let’s consider the example below of a star schema with two dimensional tables.
Snowflake is a variant of star schema with more things hanging off the end. For example, the dimension product is further divided into category and subcategory which can be seen attached to the product dimension table.
Snowflake schema structure is common in traditional transactional databases as it is the most efficient way to store the data in those systems. However, this is not the best way to structure data in Power BI. Power BI’s engine performs best with the star schema. Here are few reasons why snowflake schema is not the best option for Power BI:
- Star schemas are easier to understand as dimensions can be used to slice and dice data and facts.
- Star schema has lesser joins and shorter paths, which means a better performance is guaranteed.
- Star schema is more scalable. New dimensions can easily be added.
- Every relationship comes at a cost. The extra relationships will potentially have negative performance impacts on the performance.
- The users can have access to the entire database.
- Power BI’s vertipaq engine stores repetitive data very efficiently, particularly in the smaller lookup tables so there is no need to create hanging tables.
A data model consists of the right number of tables with the right relationship between them. The logical structure of the tables used for reporting in Power BI is known as the schema of the data model. Data model can have either a star or a snowflake schema. The star schema is however preferred over the snowflake schema.
Are you a data analyst and want to learn more about Power BI? Why not sign up for Power BI training in Australia. We provide our services in the following regions