This is a log of what I explore and learn.
The query keeps cropping up very often while I work with customers. There are many ways to handle it. Here are three ways to do it. Simple Copy Paste, using VBA and using Power Query.
Power Query is undergoing rapid enhancements based upon feedback received from users. On 29th May another new update was announced.
If you are using Power Query, please update it to get new features. Download it from here. The version number of this update is 2.12.3660.142
This is a common activity. Go to a browser page, find some tabular data or report and copy-paste it into Excel. Now you waste a lot of time cleaning up the unwanted things which also got pasted.
If you have Excel 2010 or above (Professional Plus or Office 365 edition),
you have a miraculously simple method available now : Power Query…
This is a continuation of a series on how to convert badly formatted data into an easy to analyze – tabular format. In this article, we are talking about data which has gaps and how to fill gaps efficiently. This is a new approach available using Power Query – it is fast and effective. Have a look and save hours of your precious time.
In an earlier article, we discussed how to analyze FaceBook posts using Excel. Now let us see how Twitter data can be analyzed. This article explains how to analyze your own twitter handle data. Usage Scenarios: Product specific handles, Corporate handles, brand launches, events, live seminars, customer feedback and so on.
Thank you for your support and encouragement. This vote is now closed.
I have submitted a video which demonstrates the benefits of Power BI.
This is a global competition held by Microsoft BI team.
I have put lot of effort in making the video interesting and useful.
I request you to have a look at the video by clicking on this link (it is a 7 min video).
If you like the video, please vote for me.
The competition is tough – so I need your support.
Of course, if you like the video and if you have Excel 2013, you can immediately start using the concepts shown in your day to day analytical work.
Here are some screenshots from the video.
If you use Pivot Table, then you need PowerPivot! Read on to find out why.
If you have never used Pivot Table, you should not read this article. You must learn Pivot Table first.
PowerPivot is an add-in which works on Excel 2010 / 2013 (Professional Editions only). For 2010 you have to download and install it from PowerPivot.com
For 2013, it is a part of the base installation. However you still have to activate it. File – Options – Add-Ins – Open Dropdown – COM Add-ins – GO. Enable the checkboxes next to PowerPivot, PowerView and Inquire. We will discuss the other two items later. This time we focus on why PowerPivot.
Finally you should see a PowerPivot tab in Excel.
What is PowerPivot
It is a new way of creating Pivot Tables for analyzing data. It eliminates all the shortcomings or limitations of traditional Pivot Tables… and adds more features.
Limitations of the good old Pivot Table
Large data = slow performance
Pivot Tables cannot handle very large amount of data. Even with few hundred thousand rows it starts to become slower and slower. Often you have to wait for a long time for a drag and drop of a column to finish all the calculations. Opening such large files is also extremely slow.
Row limit of Excel sheet
Maximum of 1,048,576 rows… just over 1 million. That is the limit. If your data is bigger than this, you must use some external database and then connect Excel to it. Of course, even if it works the usage will be extremely slow.
In addition, we have dependence on the external database – which makes sharing the files more difficult.
Pivot Table works on one tabular block of data
If your data is originating in multiple blocks of information, Pivot Table cannot handle it.
For example, you have transactional data of 100 rows containing the columns
Product ID, Qty, Price, Total Cost
You also have a master table containing Product Id and Product Name.
Pivot Table cannot be told to use data from the Transaction table along with the Product Master.
The workaround is to use VLOOKUP…
The VLOOKUP problem
That brings us to the next limitation of Pivot Table. Continuing with the example above, we will have to add a new column in the transaction table which uses VLOOKUP to lookup the product ID from the Product Master table and get the value of Product Name.
VLOOKUP function is now 10,000 times making the workbook even more slow to respond. Furthermore, there are only a few products. However the name of each product repeats thousands of times in the transaction table – leading to increased file size reducing the performance further.
It is not uncommon to have multiple VLOOKUP based columns in a single Pivot Table.
Yes, there is a workaround – Copy paste the VLOOKUP column as values immediately after the formula is pasted. This way we retain the desired output but eliminate the overhead of VLOOKUP. Although this works, when data changes (more data is added) you have to add more VLOOKUPs manually and then convert them again to values.
This is repetitive and cumbersome as well as error prone.
Either flat data or cube data
Many people have not explored Pivot Table as a front end to an SQL SERVER cube. Pivot Table integrates beautifully with SQL server cubes and exposes all features of the data warehouse elegantly. However, there is a limitation.
You cannot combine some data from the cube along with some data which is outside the cube. In practice, this type of requirement is common.
The solution: PowerPivot
PowerPivot is designed to solve all these problems and more – much more.
Here is the concept.
Stores data inside PowerPivot. Not Excel worksheets
Whether data originates in Excel or other places, PowerPivot stores it. Data is NOT stored in regular Excel sheets at all. This way, the storage mechanism can be made more elegant, fast and scalable.
Diverse data sources
It can create a Pivot Table using data originating from different sources. For example, transactions data can originate in a database whereas the Product Master could be a simple Excel table.
Handles millions of rows of data without inflating the file size
PowerPivot can store millions of rows of data. Of course we are worried about file size bloating up. But don’t worry. It stores it in a highly compressed manner. You can actually try this out. Take a large Excel file with data contained in a worksheet. Note the file size. Now import the same data into PowerPivot, save the file and check out the difference. You will be very surprised. The file size can reduce anywhere between 4 to 100 times depending upon the pattern of the data (compressibility).
Eliminates VLOOKUPs by creating relationships
Excel is not a database. We were using VLOOKUP as a workaround to create relationships. PowerPivot – like a database – allows you to specify which field in the transaction file relates to (is same as) the field in the master file. This type of MANY-to-ONE relationship eliminates the need for VLOOKUP and associated performance penalty.
As though this was not enough, PowerPivot adds more functionality which is completely missing in Excel.
- Ability to create hierarchies from flat data (Category, Group, Product, SKU)
- Ability to use Picture URLS (for use in PowerView)
- Creating Flat Pivot Tables
- 70+ new functions which extend the capability of Excel to something similar to sophisticated cube based formulas (without the need to learn complex MDX syntax)
- Time intelligence functions are very powerful and extremely useful
- Ability to merge cube and flat data
- Ability to refresh all data sources
- Calculated columns for additional customization
- Default field sets, and many more…
In the next article we will see how VLOOKUP can be eliminated using PowerPivot.
Office is a funny product. Everyone thinks they know it. What is the definition of “Knowing” Office? = “I know how to get my work done”
Unfortunately, even if you get the work done, it may not be in the most efficient way. Data handling in Excel is probably the single most important drain on human effort everyday.
Billions of people who have the false sense of knowledge use the same old, bad, slow, contorted methods of working with text files, exported reports and spend their precious life cleaning up data, removing unwanted headers, unpivoting data manually, copying and pasting… it is a global pity.
The primary problem? Once you found ONE method which works, you never attempted to find out if there is another, better method available.
Over time Microsoft has been adding many new ways to simplify things and eliminate manual work.
But now, there is a revolution happening. Power BI tools from Microsoft.
All that I am talking about here requires Excel 2013 Professional edition. But don’t worry. You can actually sign up for a trial version for free. See below.
Why do we need it?
The way we work with data has not changed at the core.
Ideally we should be spending maximum time in the analyze phase – because that is where we learn from what happened. That knowledge will help us in improving the future.
Unfortunately, we spend too much time in getting data, cleaning it up and summarizing it.
Now Microsoft has added a set of tools to Excel to simplify and automate many of these useless tasks.
Getting data from various sources is now done with Power Query. It also helps you in cleaning up the data, merging / splitting, adding calculated columns, creating relationships and so on. Data can be retrieved (and periodically refreshed) from a variety of sources.
Don’t worry. The good old ODBC / OLEDB, Text and Excel also works!
Excel 2013 also adds a great feature called Flash Fill to help us clean data instantly – without formulas or manual work. Although this is not a part of PowerQuery, it is a very powerful feature. I would have named it Power Data Cleanser
Once data is available and clean, it is NOT stored in Excel sheets – because there is a limit on number of rows. Furthermore, large data in Excel inflates the file size and makes it excruciatingly slow to work on it.
Therefore, the clean data is stored in a new powerful friend – PowerPivot. It can store millions of rows of data and still have very small file size.
Now what we have is called a Data Model. It is time to summarize data. This can be done in four ways.
- Traditional Pivot Table and Pivot Chart
- PowerPivot Pivot Table and Pivot Chart
- Power View
- Power Map
Traditional pivot table can still be created with this data and used to summarize the data. We are already familiar with it so no re-learning required. You still benefit from the ability to created relationships (eliminating the VLOOKUP trap) and millions of rows of data handling.
PowerPivot also creates Pivot Tables in Excel. These behave like the good old Pivot Tables but these provide new functions and features which are simply not available in traditional Excel. Many complex calculations which would otherwise require you to approach IT to create a data warehouse (Cube using a BI Tool) can now be done with these powerful Excel formulas. Technical name for these new formulas is DAX.
Power View is an entirely new way of summarizing and visualizing data. This gives you s blank page to start with. You just drag drop the fields you want and choose how they are represented (tabular, chart, map, etc). In the process you create a dashboard quickly. Every visual element acts like a filter giving you unimaginable level of analytical capability.
Power View also supports maps. Using Bing Maps, your data can be shown on a map for better understanding. This works right within Excel – no special software required.
Power Map is a more evolved version of mapping functionality – a free add-in for Excel 2013. This allows multiple layers, different visualizations and even creation of an animated step by step video to show your plan of action and to present data in a serial manner.
Ask queries in English language
As though all this was not enough, you have a miracle feature. You can actually type a query in English, like “Show sales qty for each region as column chart” and it actually does the job. You have to see it to believe it.
If you have been in IT for long enough, you will remember that there was English Query in earlier versions of SQL Server. Later Microsoft dropped the feature. Now it is back with a bang. It works even with simple Excel data with no vocabulary or synonym configuration!
Reporting and Dashboards
Creating reports is one thing. Next step is to share it with others. This is best done using SharePoint. It offers a special app for PowerBI reports. Reports can be seen and interacted with on any browser (HTML 5 version coming soon. Currently requires Silverlight) and any device or tablet (Microsoft, Android, iPAD, iPhone).
Try it yourself
GO to the Power BI Site and sign up for a trial subscription. It gives you everything you need… Full version of Office 2013, SharePoint and PowerBI app for SharePoint (full Office 365).
Worried that your existing Office installation of older version will be affected by the new version? No worries. Even that has changed now. The installation of Office 2013 can work in parallel with older versions of Office. Both work independently without affecting each other. This is called appV (application virtualization). The deployment method is called Click To Run.
We will cover more of Power BI in upcoming articles.
Update (Aug 2016): I’ve just released my first Power BI MOOC on Udemy. See details here. Use discount code AUTUMN40 to get the course at 20% discount.