A brilliant new feature is now available in Power BI – Split column into rows. To understand why we need it, you must go and read the article – Analyzing badly captured Survey data or feedback forms. This method used Power Query concepts of Split and Unpivot. Now these have been combined into a single, intelligent command called Split columns into rows. It sounds confusing at first. But soon you will realize that it is an amazing tool. Learn it just 4 minutes.
Raw data looks like this
And you get a report like this. No need to use formulas or do any manual work.
You must have the May 2017 update for Power BI Desktop installed.
Continue reading Split column into rows
Here is a quick post to show the practical scenario when the Remove Top and Bottom Rows options are useful while cleaning up data using Power Query.
Remove Top Rows
Often we get CSV files which have tabular data. Sometimes, there is a title, blank rows and some other related but unwanted data just above the useful tabular data. In such cases, we use the Remove Top Rows option.
Remove Bottom Rows
If you get data from web pages (HTLM tables), the header row is repeated at the bottom. This option helps you remove the unwanted row.
Another common issue is that many reports dump data with a grand total at the bottom. Remove Bottom Rows option is also usefu in these cases.
It is quite possible that we focus on the top few rows while cleaning up data. It is equally important to import the data and look at the bottom rows. The bottom part of the data can have many issues which often go unnoticed and may even generate inaccurate results.
Here is another short but power packed video which explains how analytics can drive growth. Read the related articles here.
Often we get data which is distributed in multiple column sets. Here is how you can convert it to good data (tabular column format) using Power Query. Watch the video. Sample File download.
This is a simple check sheet. Usually done on paper. But with ubiquitous tablets being available, it can be done on Excel as well.
Continue reading Quality Management 2. Check Sheet
This is the worst type of input data to get. Each row has been split into two (or sometimes more) rows. We want those to be combined into a single row. In an earlier article, I have shown one method of doing this.
Now here is a simpler, faster and more powerful method using Power Query.
(Estimated reading time 12 min)
Continue reading Data Split into Two rows – Cleanup using Power Query
Presentation download with reference links
Download the Presentation (3 MB). I have added lots of links to related articles and external references including books written by stalwarts in Excel analytics field.
Presentation was used for delivering the webinar. You can download the webinar video from here. Or view it on YouTube using this link.
Thank you for the overwhelming response for Data Analytics webinar video. More than 800 attendees, 70+ questions.
Watch the video on YouTube (use 720P) or Download (MP4, 188 mb, 60 min) and view on your PC. A big thank you to the Microsoft and Economic Times team for making this possible.
Power Query is an amazing tool for importing and cleaning up data from various sources. The problem was that it required you to have professional version of Office. Now that problem is solved…
Continue reading Power Query now available for standard version of Excel 2013
There are two simple approaches in reversing row order. One is to add an extra column with serial numbers and then sort on that column in descending order. This works for small amount of data. For large data it is best to import it in Power Query and choose Transform tab – Reverse Rows.
Benefit of Power Query? It works on a smaller sample of data and then applies the transformation when you choose Save and Load option. This is much faster than getting all the data and then trying to sort it (which is the first method).
Why is this required? Usually required with logs where the first transactions or rows are at the bottom. So the data is received in reverse chronological order. Twitter feeds, Timeline Updates, Live blogs – all follow this pattern.
This method works independent of the time-stamp column. What is wrong with timestamp? It may be in different time formats, some rows may have same timestamp and some rows may have no timestamp at all.