This is a common activity. Go to a browser page, find some tabular data or report and copy-paste it into Excel. Now you waste a lot of time cleaning up the unwanted things which also got pasted.
If you have Excel 2010 or above (Professional Plus or Office 365 edition),
you have a miraculously simple method available now : Power Query…
Download Power Query and install it. You will need it for this exercise. I am not providing the link for Power Query download. It gets refreshed quite often. Just search for Excel Power Query Download and install the latest version.
As an example I am going to use a web search. However, the same method problem occurs when you run reports on internal web applications as well.
Let us try to get a list of cities in the world by population. Sounds like an easy thing to do. But wait and see… I am using Bing because it integrates with Power Query.
(Could not resist showing the sea lion!)
We get lot of results… Now what is the problem? You still don’t know which page will be useful to you. It is trial and error. After trying out few pages, you will eventually find the one which has useful data in tabular format. I found it in the first Wikipedia reference itself.
The web page has a table which shows what I want…
There are lots of cities and you need to perform some selection acrobatics to copy and paste the data. Although I am going to show you a method of eliminating this struggle, let me also show you a good workaround for selecting large areas from web pages…
How to remove all pictures in one go?
You are thinking click on each picture and remove one by one? NO WAY.
First of all, in these cases, pictures may have hyperlinks. So you may not be able to click on it at all. Don’t worry. Right click on the picture. The menu will appear but get rid of it (press and release Alt key once. Escape also works, but Alt is a more effective way.).
Now ONE picture is selected. While it is selected press the CTRL A key. Surprisingly all pictures are now selected. Press the Delete key once.
Does that sound efficient? Compare it with what you have been doing till now
How to select large amount of data in a web page?
Select some small area in the beginning. DO NOT attempt to scroll down while selecting the rest of the data. Just scroll to the absolute bottom where the required data ends. Now press SHIFT key on the keyboard and click beyond the end point using left mouse button.
Now area from original selection to the end point will be selected. Mind you, this is just a work-around in this case but it may be useful in some other cases. (Large text selection, for example). This works in any context of large selection – not just on browsers.
Pasted in Excel – and what do you get?
All sorts of unwanted things have been pasted – hyperlinks, pictures, wrong formats, script icons and so on. Now you spend half an hour just cleaning up the data…
Fortunately for us, there is a better way available now.
Power Query to our rescue
Our objective is still the same. We want data about cities by population in Excel. So here is what we do differently – START EXCEL. No need to go to the browser to search.
The search comes to you WITHIN Excel now.
Click on the Power Query tab and choose Online Search
A small browser window opens on the right side (technically it is called a pane). Type the same query there list of cities by population and click the lens.
The search results are shown in the pane. Notice that we still have the same problem as before. As of now, we do not know which of the search result has useful and tabular data.
Here is the time for a miracle now. You would never have expected this to happen – it is like EXCEEDING EXPECTATIONS!
Just move the mouse cursor over each of the search results and pause there for a second. DO NOT CLICK yet. Miraculously, a temporary window will pop up showing the tabular data within that page (a page which you haven’t even opened yet). The search text is highlighted.
This is the meaning of the word “POWER” in Power Query. It is not just a brand name. It is a revolution!
Preview the results and find the one we are looking for. Now you will be tempted to click on it to get the data into Excel – but wait. Not yet.
Bad habits as well as constraints are called “Comfort Zone”!
Why are you so desperate to get the data in Excel? Because then you can clean it up and get on with your work. This is what we have been doing for decades. Get the data, import it and then repair it. What happens next month? More data comes in, we import it and clean it up again…
Remember our base benchmark of inefficiency: Repetition?
Read this article I wrote using Calligraphy to understand what I mean.
Clean up ONCE and let Excel do it in future
How about doing the cleanup ONLY ONCE so that Excel can do it next time you need to refresh the data? Yes… another example of Exceeding Expectations!
Yes, you could have done this by recording a macro in any version of Excel. But you know how painful it is and how macros don’t always run as expected!
So back to our example… when you find the data you want by hovering mouse cursor and looking at the preview, click on the Edit button.
Now a separate window opens showing you the Power Query capabilities. This is a very powerful place – mind you. Lot of powerful and useful stuff can happen here with just few clicks.
In case of our data, we want only few columns. Select the columns you need using CTRL CLICK.
Right click in one of the selected column headers and choose Remove Other Columns
Now look at the remaining data and check if the data types and the formats are correct. If not, we can repair them here itself.
In this case I am just going to trim the column City to remove unwanted spaces which can creep in. Right click on the column name and choose Transform – Trim. Now we are ready to import data into Excel.
The steps are already remembered by Excel
While you were performing the clean-up operations, Power Query keeps track of those actions and records them like a special macro –Applied Steps.
This eliminates the need for you to perform repetitive data clean up and also makes macro recording redundant.
If you are the techie types, each step is internally stored in a special purpose language called M. Read more about Power Query Language. Another great resource for practical usage scenarios is Chris Webb’s blog.
Where to import the data
At the bottom right corner you get two options – whether to import the data into Excel worksheet or into Data Model (Power Pivot).
It is a good idea to import into Excel if:
- The data is not very large
- If you want to use the regular Excel functions with it along with other data which you already have
- If you want to use the regular pivot table with itImport it into Data Model if:
- If you want to use it with Power Pivot, Power Map, Power View
- If you need to create relationships with other data items
- If the data is very large and regular Excel import is slowing things down (or it cannot handle the number of rows at all)There are many more issues which will help you with this decision, however in this article, I am keeping it simple. In this case I am importing the data in Excel Worksheet. And finally, you have it – cleaned and ready to use in an instant – without any laborious clean up activity from your side!
What if you want to refresh the data?
In this case the data is fairly static. It is not often that we add new cities. However, if this web page was a system report or some other data which keeps changing – Trending data of any kind – then you don’t have to repeat this process at all.
Next time you want to refresh the data, click inside the data and choose the Query tab from the Table tools tab. Click Refresh.
Now Excel will go to the original data source, fetch the data, compare it, refresh it and also perform the actions you had specified earlier. That’s it.
This was just one data source and one query. You may have multiple such queries in the same Excel file. Open the Power Query tab – Queries – Workbook to see the list of all queries and edit / refresh them.
This is what we did – in brief.
- Use Power Query to perform a web search
- Preview and identify suitable data
- Edit the data – select only the required columns, clean up data types, and import it
- Refresh when required
- Use with Pivot Tables, Power Pivot, Power View, Power Map and any other method which works with Excel
Try this out and let me know your feedback.