Category Archives: Analytics

Big Data – Small Problem! Is the data good?

For those of us who have been in the industry long enough – we know that IT is a hype driven field. New fads and fashions come and go… some residual effect and benefit does remain but it is soon overridden by the next hype. Nothing wrong with this per se.

However, there is a disturbing and predictable pattern. Every hype is presented as the next new thing… everyone is supposed to jump on the bandwagon immediately, get enamored with the cutting edge and reap immediate benefits and gain the “first mover” advantage. Sounds nice. But does not deliver the promised value… in most cases!

The problem is that just because something new, genuinely useful and has been marketed well, does not mean we lose track of what is currently happening. Everyone would want you to be at the cutting edge… but there is gap between where you are and where your hype driven aspirational destination is.

In the process what happens is really sad – for the technology vendors and the customers. It is called a lose – lose situation. Most assume that data is already available in a usable and easy to analyze format – which is absolutely wrong. Usually it is available but in the worst possible format. People spend most of their time cleaning it. What is the point?

This article the beginning of a new series about data. It will cover – data concepts, data gathering, data clean up, effective analytics and finally the psychology of analytics.

Continue reading Big Data – Small Problem! Is the data good?

Vote for my Video in global demo competition by Microsoft

Thank you for your support and encouragement. This vote is now closed.

I have submitted a video which demonstrates the benefits of Power BI.

This is a global competition held by Microsoft BI team.

I have put lot of effort in making the video interesting and useful.

I request you to have a look at the video by clicking on this link (it is a 7 min video).

image

If you like the video, please vote for me.

The competition is tough – so I need your support.

Of course, if you like the video and if you have Excel 2013, you can immediately start using the concepts shown in your day to day analytical work.

Here are some screenshots from the video.

image

image

global demo competition

image

Thank you.

Use PowerPivot instead of VLOOKUP

This time I wanted to try a new style of writing.

I created a presentation instead of the usual document like style.

PowerPivot vs Vlookup

Download presentation and view

Download the presentation (300kb) from here

Download sample file with data and Pivot from here

View on SlideShare

Tell me if you like this style…

Introducing PowerPivot – do you need it?

If you use Pivot Table, then you need PowerPivot! Read on to find out why.

If you have never used Pivot Table, you should not read this article. You must learn Pivot Table first.

PowerPivot is an add-in which works on Excel 2010 / 2013 (Professional Editions only). For 2010 you have to download and install it from PowerPivot.com

For 2013, it is a part of the base installation. However you still have to activate it. File – Options – Add-Ins – Open Dropdown – COM Add-ins – GO. Enable the checkboxes next to PowerPivot, PowerView and Inquire. We will discuss the other two items later. This time we focus on why PowerPivot.

Finally you should see a PowerPivot tab in Excel.

PowerPivot tab

What is PowerPivot

It is a new way of creating Pivot Tables for analyzing data. It eliminates all the shortcomings or limitations of traditional Pivot Tables… and adds more features.

Limitations of the good old Pivot Table

Large data = slow performance

Pivot Tables cannot handle very large amount of data. Even with few hundred thousand rows it starts to become slower and slower. Often you have to wait for a long time for a drag and drop of a column to finish all the calculations. Opening such large files is also extremely slow.

Row limit of Excel sheet

Maximum of  1,048,576 rows… just over 1 million. That is the limit. If your data is bigger than this, you must use some external database and then connect Excel to it. Of course, even if it works the usage will be extremely slow.

In addition, we have dependence on the external database – which makes sharing the files more difficult.

Pivot Table works on one tabular block of data

If your data is originating in multiple blocks of information, Pivot Table cannot handle it.

For example, you have transactional data of 100 rows containing the columns
Product ID, Qty, Price, Total Cost

You also have a master table containing Product Id and Product Name.

Pivot Table cannot be told to use data from the Transaction table along with the Product Master.

The workaround is to use VLOOKUP…

The VLOOKUP problem

That brings us to the next limitation of Pivot Table. Continuing with the example above, we will have to add a new column in the transaction table which uses VLOOKUP to lookup the product ID from the Product Master table and get the value of Product Name.

VLOOKUP function is now 10,000 times making the workbook even more slow to respond. Furthermore, there are only a few products. However the name of each product repeats thousands of times in the transaction table – leading to increased file size reducing the performance further.

It is not uncommon to have multiple VLOOKUP based columns in a single Pivot Table.

Yes, there is a workaround – Copy paste the VLOOKUP column as values immediately after the formula is pasted. This way we retain the desired output but eliminate the overhead of VLOOKUP. Although this works, when data changes (more data is added) you have to add more VLOOKUPs manually and then convert them again to values.

This is repetitive and cumbersome as well as error prone.

Either flat data or cube data

Many people have not explored Pivot Table as a front end to an SQL SERVER cube. Pivot Table integrates beautifully with SQL server cubes and exposes all features of the data warehouse elegantly. However, there is a limitation.

You cannot combine some data from the cube along with some data which is outside the cube. In practice, this type of requirement is common.

The solution: PowerPivot

PowerPivot is designed to solve all these problems and more – much more.

Here is the concept.

Stores data inside PowerPivot. Not Excel worksheets

Whether data originates in Excel or other places, PowerPivot stores it. Data is NOT stored in regular Excel sheets at all. This way, the storage mechanism can be made more elegant, fast and scalable.

Diverse data sources

Powerpivot Table Import Wizard

It can create a Pivot Table using data originating from different sources. For example, transactions data can originate in a database whereas the Product Master could be a simple Excel table.

Handles millions of rows of data without inflating the file size

PowerPivot can store millions of rows of data. Of course we are worried about file size bloating up. But don’t worry. It stores it in a highly compressed manner. You can actually try this out. Take a large Excel file with data contained in a worksheet. Note the file size. Now import the same data into PowerPivot, save the file and check out the difference. You will be very surprised. The file size can reduce anywhere between 4 to 100 times depending upon the pattern of the data (compressibility).

Eliminates VLOOKUPs by creating relationships

Excel is not a database. We were using VLOOKUP as a workaround to create relationships. PowerPivot – like a database – allows you to specify which field in the transaction file relates to (is same as) the field in the master file. This type of MANY-to-ONE relationship eliminates the need for VLOOKUP and associated performance penalty.

Powerpivot Create Relationship

More functions

As though this was not enough, PowerPivot adds more functionality which is completely missing in Excel.

    1. Ability to create hierarchies from flat data (Category, Group, Product, SKU)
    2. Ability to use Picture URLS (for use in PowerView)
    3. Creating Flat Pivot Tables
    4. 70+ new functions which extend the capability of Excel to something similar to sophisticated cube based formulas (without the need to learn complex MDX syntax)
    5. Time intelligence functions are very powerful and extremely useful
    6. Ability to merge cube and flat data
    7. Ability to refresh all data sources
    8. Calculated columns for additional customization
    9. Default field sets, and many more…

In the next article we will see how VLOOKUP can be eliminated using PowerPivot.

Working with Excel data? Learn Power BI

The need

Office is a funny product. Everyone thinks they know it. What is the definition of “Knowing” Office? = “I know how to get my work done”

Unfortunately, even if you get the work done, it may not be in the most efficient way. Data handling in Excel is probably the single most important drain on human effort everyday.

Billions of people who have the false sense of knowledge use the same old, bad, slow, contorted methods of working with text files, exported reports and spend their precious life cleaning up data, removing unwanted headers, unpivoting data manually, copying and pasting… it is a global pity.

The primary problem? Once you found ONE method which works, you never attempted to find out if there is another, better method available.

Over time Microsoft has been adding many new ways to simplify things and eliminate manual work.

But now, there is a revolution happening. Power BI tools from Microsoft.

All that I am talking about here requires Excel 2013 Professional edition. But don’t worry. You can actually sign up for a trial version for free. See below.

Why do we need it?

The way we work with data has not changed at the core.

image

Ideally we should be spending maximum time in the analyze phase – because that is where we learn from what happened. That knowledge will help us in improving the future.

Unfortunately, we spend too much time in getting data, cleaning it up and summarizing it.

Now Microsoft has added a set of tools to Excel to simplify and automate many of these useless tasks.

Getting data from various sources is now done with Power Query. It also helps you in cleaning up the data, merging / splitting, adding calculated columns, creating relationships and so on. Data can be retrieved (and periodically refreshed) from a variety of sources.

Power BI - Power Query

Don’t worry. The good old ODBC / OLEDB, Text and Excel also works!

Excel 2013 also adds a great feature called Flash Fill to help us clean data instantly – without formulas or manual work. Although this is not a part of PowerQuery, it is a very powerful feature. I would have named it Power Data Cleanser Smile

Once data is available and clean, it is NOT stored in Excel sheets – because there is a limit on number of rows. Furthermore, large data in Excel inflates the file size and makes it excruciatingly slow to work on it.

Therefore, the clean data is stored in a new powerful friend – PowerPivot. It can store millions of rows of data and still have very small file size.

SNAGHTMLc9e9125

Now what we have is called a Data Model. It is time to summarize data. This can be done in four ways.

  1. Traditional Pivot Table and Pivot Chart
  2. PowerPivot Pivot Table and Pivot Chart
  3. Power View
  4. Power Map

Traditional pivot table can still be created with this data and used to summarize the data. We are already familiar with it so no re-learning required. You still benefit from the ability to created relationships (eliminating the VLOOKUP trap) and millions of rows of data handling.

PowerPivot also creates Pivot Tables in Excel. These behave like the good old Pivot Tables but these provide new functions and features which are simply not available in traditional Excel. Many complex calculations which would otherwise require you to approach IT to create a data warehouse (Cube using a BI Tool) can now be done with these powerful Excel formulas. Technical name for these new formulas is DAX.

image

Power View is an entirely new way of summarizing and visualizing data. This gives you s blank page to start with. You just drag drop the fields you want and choose how they are represented (tabular, chart, map, etc). In the process you create a dashboard quickly. Every visual element acts like a filter giving you unimaginable level of analytical capability.

image

Power View also supports maps. Using Bing Maps, your data can be shown on a map for better understanding. This works right within Excel – no special software required.

Power Map is a more evolved version of mapping functionality – a free add-in for Excel 2013. This allows multiple layers, different visualizations and even creation of an animated step by step video to show your plan of action and to present data in a serial manner.

image

Ask queries in English language

As though all this was not enough, you have a miracle feature. You can actually type a query in English, like “Show sales qty for each region as column chart” and it actually does the job. You have to see it to believe it.

If you have been in IT for long enough, you will remember that there was English Query in earlier versions of SQL Server. Later Microsoft dropped the feature. Now it is back with a bang. It works even with simple Excel data with no vocabulary or synonym configuration!

Reporting and Dashboards

Creating reports is one thing. Next step is to share it with others. This is best done using SharePoint. It offers a special app for PowerBI reports. Reports can be seen and interacted with on any browser (HTML 5 version coming soon. Currently requires Silverlight) and any device or tablet (Microsoft, Android, iPAD, iPhone).

image

Try it yourself

GO to the Power BI Site and sign up for a trial subscription. It gives you everything you need… Full version of Office 2013, SharePoint and PowerBI app for SharePoint (full Office 365).

Worried that your existing Office installation of older version will be affected by the new version? No worries. Even that has changed now. The installation of Office 2013 can work in parallel with older versions of Office. Both work independently without affecting each other. This is called appV (application virtualization). The deployment method is called Click To Run.

We will cover more of Power BI in upcoming articles.

Update (Aug 2016): I’ve just released my first Power BI MOOC on Udemy. See details here.  Use discount code AUTUMN40 to get the course at 20% discount. 

Facebook analytics using Excel 2013 Power Query: Without programming!

In few minutes you can analyze any public Facebook page data.
Here we see how to analyze the likes and comments – instantly.

You need Excel 2010 or Excel 2013 (Professional Plus edition) or Office which is installed from Office 365.

  1. Close Excel if it is open
  2. Download a new, extremely powerful tool Power Query for Excel
  3. Install the tool
  4. Open Excel
  5. You should now see a tab called Power Query
  6. Open Power Query tab and open the From Other Sources dropdown
  7. Choose FacebookFacebook analytics using Excel 2013 Power Query
  8. Choose me and Statuses (you can choose other types as well)

    image

    Remember that you can type any username or object id which has information accessible to you

  9. Click Ok
  10. Sign in to your Facebook account
  11. Save the credentials
  12. Now it will open a query result window and extract status update information from FB
  13. At this stage it shows few results. Later it will fetch all possible results.image
  14. Keep the columns Message, Updated Time, Likes and Comments.
  15. Remove other columns by right clicking on each column and deleting it
  16. Now we want the count of likes and comments
  17. As of now, both show Table.
  18. Notice the small icon next to the Likes and Comments column headingsSNAGHTML17052850
  19. Click on that icon and do the following for Likes and Comments
    1. Choose Aggregate and Count of ID
    2. Now the Column shows count
  20. The date time column contains locale information. We want a simple Date Time column.
  21. Right click on the Updated Time column and choose Date Timeimage
  22. On the right side pane, expand the Steps area
  23. Notice that all the steps you performed are remembered.image
  24. Click the Done button
  25. Wait for data to be fetchedimage
  26. Now you can use this data in a Pivot Table or PowerView to analyze the FB data easilyimage

    This is a PowerView chart of one FB site

  27. Here is another one as a PivotChart based upon another public siteimage

 

Enjoy…
Now you can do sophisticated FB analysis without knowing any programming, FQL or Graph API syntax.

Explore Power Query.

It has a very powerful syntax. Yes it is complex but it is extremely powerful.
We will cover more of Power Query, Power View and Power Map in future posts.