This is a continuation of a series on how to convert badly formatted data into an easy to analyze – tabular format. In this article, we are talking about data which has gaps and how to fill gaps efficiently. This is a new approach available using Power Query – it is fast and effective. Have a look and save hours of your precious time.
Contents
The bad data
We are talking about this type of data – tabular but with gaps.
The green gaps we want to fill. The red ones are NOT to be filled because those have a business meaning. Red ones show that the amount was either zero or unknown. Green ones are just convenience based gaps.
How to fill them up? Three possible solutions
If these came from a Pivot Table copy paste – use Repeat All Item Labels option.
Read this article for details – Bad Data to Good Data: Pivot Copy Paste
If you don’t have the original Pivot table with you or if this was not originating from a Pivot table, then we have to do this manually. Read this article for details. Data clean-up: Are you helping Excel or Excel is helping you 😉
Now there is another method available – which does exactly what the second method does, but even faster and more elegantly. This method uses Power Query. To use this method, you must have Excel 2010 Professional Plus or later version OR Office 365 based Office subscription (which is always professional edition).
If you do not have Power Query installed, search for “Excel Power Query Download” and install the latest version. The reason I am not providing a download link is because I want you to download the latest one as on the day you are reading this article.
How to fill gaps using Power Query
Let us look at the data above. It is tabular but has few gaps. First we will create a table. We know that is the bare minimum requirement for good data.
If the data is large, it is better to save the Excel file, close it and import the data into a blank Excel file using Power Query.
You can download the sample file from here
In this case, I will import the table in the same file. Click inside the table and choose – Power Query – From Table.
If the Excel file was saved separately you will need to choose From File – Excel and then choose the table containing the data.
In either case, the Power Query editor will open and show the data. Empty cells are shown as null in Power Query.
We want to fill the gaps in the first two columns. Select the first two columns and right click. Choose Fill Down. Same option is available in the Ribbon as well…
Miraculously – the gaps are filled. That’s it.
What was done? Excel traversed the data from top to bottom for each column. Whenever an empty cell was found, it was replaced with the value above it. The logic was known to us anyway. However, this is happening without formulas and it is extremely fast with large data.
Fill up option
Power Query also supports a Fill Up option. This performs the fill function, but in reverse order. This is how it would work with the same data…
In this case, it does not make any sense. But if the data requires filling up from bottom side, this feature is very useful.
What kind of data requires the Fill Up feature? When the data is captured in reverse chronological order – the first items are at the bottom. This does not happen usually, but can be the case in places where data is captured from sensors or process monitoring equipment. I suspect there is a more specific reason why the Fill Up feature was provided. But I am not able to visualize it yet. If you find the scenario / use case where Fill Up is a perfect solution, do let me know. Post the comment here.
— —
2 Responses
Reblogged this on SutoCom Solutions.
Thanks again for re-blogging.