The beginner level attendees of our Power BI training in Sydney and Power BI training in Melbourne often find it difficult to make a Data Model out of a Flat File. Flat file database can be used to hold simple data formats like lists. However, if you have more than a few thousand records, they can become inconvenient to handle. They can also become difficult to update and the risk of duplication increases.
In order to start the process of conversion, you should know what’s a Flat File? A flat typically contains alphanumeric data with little to no formatting. A flat file’s architecture is based on a standard format defined by the type and character lengths specified in the columns. A comma-separated values (CSV) file is one of the most common flat file types.
Some common uses of Flat Files:
- Flat files can be used to export data from programs that don’t allow any other options to export data out.
- It is the most commonly used data storage method for single-item lists.
- Flat files are still frequently utilized for a variety of modern applications because of the fact that they provide a simple way of storing, modifying, and accessing data.
- Flat file databases are easy-to-create data files for frequently accessible information such as name and address files, membership lists, and class rosters.
- Cloud warehouses frequently deploy object storage as they can handle large amounts of data. A flat-file database manages the data stored on object systems, retaining basic metadata such as file names and where they are stored.
Why shouldn’t you use a single flat file that has it all!
Flat files are unwanted in a BI process because they increase the processing load on the Power BI’s processing engine and therefore do not support quick analysis. This is particularly true for large data sets. The Processing engine loads the complete data set every time a query is run. This slows the visual loading process as the data set grows in size.
Another data modeling practice is to avoid condensing everything into a single huge table with hundreds of columns. This practice is followed because:
- Maintaining such a large table is always difficult. Combing additional fields into the existing table becomes a hectic task and mostly results in erroneous data.
- The amount of information saved in the database is determined by the fields. If you add new fields, you will need to save more information! This is a maintenance issue as well.
- It’s not always possible to combine! What if you’d like to include transactions data at the same time? Most of the time, this is not possible since you would not have transactions data at the Employee level; instead, you may only have it at the month and product level. As a result, there will be plenty of large tables.
But the good news!
A relational database management system stores the data in a more connected way as compared to a flat file system. A flat file can easily be converted into a data model. It just takes 4 simple steps to break down your flat file into a data model. Yes, you got it right JUST FOUR STEPS.
Let’s walk through each of these steps!
Flat files consist of a single table containing many columns. For this blog, we will be using the Sales Analysis flat file shown in the image below.
This flat file can be downloaded from: https://powerbitraining.com.au/downloads-flat-file-to-data-model/
A flat file consists of a single table containing all the columns needed for analysis.
Data models on the other hand solve the issues prevalent in the flat files by normalizing them. Data models consist of three entities:
- This table consists of metrics or facts related to a business process. Basic Facts or Measures are summarized.
- Dimension tables are used to define the dimensions of a business process. Each dimension table lists the attributes related to a dimension. Dimensions filters and groups your data.
- Fact and Dimension tables are then linked by using keys (primary and foreign).
Let’s classify each column as either a fact measure or dimension attribute.
The city where the office site is located
Email addresses of the Employees
Employee identifier ID
Address of the office site
Latitude of the office site
Longitude of office site.
Name of Employee
Office identifier ID
Processing Date of transaction
Postal Code of Office site
Sales Amount of transaction.
Street Address of office site
Suburb related to office
As mentioned above the complete process of transformation can be broken down into 4 following phases:
Let’s start the Process
Download the Sales_Data_Flat.xlsx dataset and follow the below steps to convert the flat file into a data model.
- Duplicating Queries
Duplicating a query is important for a flat file to be converted into a data model because initially, a single query contains the entire data set.
To Duplicate Queries:
- Right Click on Sales Analysis.
- Click on Duplicate
- Right click on Sales Analysis again and click on Duplicate again.
- Rename Sales Analysis to Fact.
- Rename Sales Analysis (2) to Employees.
- Rename Sales Analysis (3) to Office.
This will create multiple copies of the same data. The duplicated data will later be transformed into a data model using the techniques defined in the next sections.
- Removing Columns
For a flat file to take the shape of the data model, each query must contain only the required data. Therefore, extra columns must be removed.
To remove columns:
- Click on the Fact query in the Queries section
- From the Ribbon Menu, click on Home Tab
- From the Column section, click on Choose columns.
A pop-up appears
- Uncheck select all columns.
- Now only select the checkboxes of OrderDate, SalesAmount, Employee ID, and Office ID.
- Click OK.
Fact table must only contain facts and keys that can be linked to other dimensions.
- Click on the Employees Query in the Queries Section Pane.
- From the ribbon, click on Home > Choose Columns.
- Uncheck all the selected columns except Employee ID, Email Ids, Names, Office Id and Reports to.
- Click OK.
A dimension table only contains attributes that define each row in a demission independently.
- Click on the Office Query in the Queries Section Pane.
- From the Ribbon Menu, click on Home > Choose Columns.
- Uncheck all the selected columns except Office ID, City, Suburb, Post Code, Street, Full Address, Longitude and
- Click OK.
This step cleans each of the tables by keeping only the required data columns. All the extra columns from each of the tables are removed.
- Removing Duplicates
The dimension tables in the data model must only contain unique values. Duplicates must be removed to reduce redundancy in the data.
To Remove Duplicates:
- Click on Employees Query in the Queries section pane.
- Right click on Employee ID
- Click on Remove Duplicates.
- Click on Office Query in the Queries section
- Right click on Office ID, click on Remove duplicates.
- Click on Close & Apply, in the close section.
This step completes the data transformation required in the Power Query Editor. The next step will be performed in the modeling tab of the tool.
- Build Relationships
In this section, you will learn how to finally link the queries so that they assume the shape of a data model.
To build relationships:
- Open the Modeling Tab from the left pane.
- Hold the cursor and drag the Employee ID column from Employees to Fact.
A white line appears indicating a valid relationship. Power BI automatically detects the cardinality and primary and foreign keys present in the tables.
- Hold the cursor and drag it from Office to Fact.
A Simple data model with one fact and two dimension tables has been created.
Bad Data Modelling Scenarios
When individuals first start using Power BI as a business intelligence tool, they often suggest, “Let’s just import everything in one large flat document, like an Excel worksheet.” This method of working is unorganized and does not provide a user-friendly analytics structure. In Power BI, narrow tables are considerably easier to deal with. As the number of data set grows, it will have an impact on performance thus making it inefficient. Furthermore, adding measures will further complicate matters in the existing long and wide table. Power BI was created for people who will never have to think about data warehouse design. Originally, this self-service tool was built to enable individuals with little or no understanding of standard practices to integrate data from different sources, such as excel spreadsheets, databases, and other databases, without knowing how they were set up.
What are the benefits of data modeling?
- By modelling the data, the business requirements can be defined more systematically.
- It establishes a structure for communication between the IT and business departments by reducing the data update issues.
- Comprehensive reporting requirements can reveal the potential for optimizing business processes using data modelling at its core.
- Data modelling also improves data integrity thus reducing errors.
So, the data model isn’t only about what you get as the final result, but also about the process of how you get it. The procedure itself has a lot of advantages.
A flat file can be converted into a simple data model by just following 4 simple steps. The first step is to duplicate your query in the query editor. The next step is to remove the extra columns from each query. For this step, you must know which of the columns belong to the fact table and which belong to the dimension table. After this it is necessary to remove the duplicates in the dimension tables. This fulfills the condition of distinct keys in the dimension table. The last and final step is to build a relationship between your tables and you’re good to go!
To download the complete Power BI Report, click on this link: https://powerbitraining.com.au/downloads-flat-file-to-data-model/If you liked our blog, drop your email in the comment section and get our blog updates!
Are you a data analyst and want to learn more about Power BI? Why not sign up for Power BI training in Australia. We provide the following Power BI trainings: