“Without big data, you are blind and deaf and in the middle of a freeway.” – Geoffrey Moore
But what to do if your data lies on a website with constant updates…????
Web scrapping is a method of extracting data from a website. Web scrapping is an important technique in the data gathering process. It is quick and efficient and a live data source is created within minutes. This data can be processed to gain insights and fuel decision making.
“Above all else, show the data.” – Edward R. Tufte
Well, How to bring in data to your Power BI file…??
Power BI has developed a web connector to facilitate web scrapping for Power BI users.
This blog outlines the steps required to scarp data from a public website.
In this exercise we will practice web scrapping using a web query. For this exercise, we will use a public webpage that lists the public holidays in New Zeeland.
Importing Web Query Using Parameter
Web queries can be made dynamic with the help parameters. We’ill see how this can be done but first lets get familiar with a few terminologies here:
Custom functions in Power BI are used to perform specific steps in a defined order. They are used to reduce redundancy, increase consistency and re-use the code
In Microsoft Power BI Query Editor, Parameters are data structures used to pass values to other queries. They act as indicators to the filter the data at query level before it is loaded into the data model. Parameters render the functions by passing them a user-specific value. Whereas Custom functions are basically a query running upon various other queries.
Back to business…!
A new web page is created annually for each year, for example for the year 2016, we have:
Now, we will use parameters and functions to render URL’s year to get holidays for other years.
To Get Started:
1. Click on Home Tab in the Ribbon Menu.
2. In the External Data section, click on Get Data.
3. Click on Web.
4. Paste the URL in the text box.
5. Click OK.
A pop-up box appears.
6.Select the Checkbox at the left of Table0.
7. Click Edit.
8.Click on the column containing null values.
9. Click on Remove column.
Hurraay ! The data from the public page has been scrapped using the power bi web connector.
Importing the data
To Import a Web Query Using Parameter:
1.Click on Table 0 in the Query Editor.
2.In the Applied Steps section, click on the gear icon at the left of Source
The source box pops up.
3.Click on Advanced.
4.In the URL parts cursor box, erase html at the end of URL.
5.Now click on ABC drop-down.
6.Click on Parameter.
Note that the Parameter Year is intelligently detected.
7. Click on Add Part.
8.Type .html in the text box.
The URL Preview gets updated with the Parameter name instead of year.
9. Click OK.
We have configured URL consisting of three parts.
- The first part refers the main page containing the tab for different years.
- The second part consists of the parameter used to input year
- The last part is used to complete URL.
This blog serves as a complete guide to scarp from a website. As an example, data from a public website has been scrapped. The main steps for data scrapping are outlined in the blog. Following these steps, you not only pool data but also create a link to live data query. This means that similar data can be accessed in future by just changing some value.
You may also like:
- Common Mistakes When You’re Working with Spatial Data in Power BI
- Finding the trend line in your data using Advanced analytics in Power BI
- Creating Visuals with Q&A
- Four Simple Steps to Create Bubble Map in Power BI
Are you a data analyst and want to learn more about Power BI?
Why not sign up for Power BI training in Australia. We provide our services in the following regions: