fbpx

WhatsApp Group Chat Analytics using Excel and Power Query

Well, while we are on the topic of WhatsApp Group Chat Analytics , data cleaning and analytics, here is another interesting article. In fact, this article also falls under the category of “What did I Learn Today?”.

Like many of you, I am also a part of many WhatsApp groups. I wanted to check how the chat transcript can be analyzed. Here is the process I followed using Power Query and Excel.

WhatsApp Group Chat Analytics

Get the WhatsApp Transcript by mail

This is easy. Just mail to yourself and then use the attached text file as the input for further analysis. DO NOT export it with media. The file size can get very large if it is a busy group.

Why not use Text Import Wizard of Excel

I am going to use Power Query instead of the regular text import wizard. This is because, Power View Gives me many more options which the import wizard does not provider. Read on and you will find out the difference.

You can download and install Power Query from here.

Typical WhatsApp transcript looks like this.

whatsapp transcript

Data Clean Up

There are lots of steps involved. So I am going to keep the description brief – explaining only things which are not obvious. Ten steps required to get clean data for further analysis.

Open Power Query Tab and use From File – Text file. The data is imported.

powerquery tab

This just splits the Time column. Using the delimiter only ONCE is a very good option which is not available with traditional Text Import Wizard of Excel.

comma delimiter

SNAGHTMLafc96d3

Similarly, split the date portion using – as the delimiter.

data splitting

SNAGHTMLafe5de7

Now split the name of the person. It is always followed by a :

SNAGHTMLb020db1

image

This is the data with columns renamed.

SNAGHTMLb064596

Remove unwanted dates

The data gets mixed up if there are CRLF (Enter key) within the messages. To clear those rows, I removed anything with Null in the Date column. You will need to look at your data and you may need to use a different approach.

remove unwanted data

image

Finally these are the steps which were required. Next time you get another transcript, you can keep the file name same and overwrite the old file with the new file and refresh this query. It is now automated!

image

Now the analytics

This is very easy using a Pivot Table and Pivot Chart. Here are some examples.

analytics

anlaytics by time

image

analytics

Dynamic Text Analysis

Suppose you want to search for the instances of a particular word, you will need a calculated column. The search word is kept in a separate cell outside the data and Find is used to check for the occurrence.

image

This column should be used in value area for counting various the occurrence.

image

This is a workable solution, but is cumbersome because you have to refresh the pivot table every time you change the search condition. Power View provides a better solution for this.

Using Power View

Here I have a simple Power View dashboard of the data with three charts. Who posted when by date, Posts by person Pie chart and Posts by time. All these charts are based upon all the posts.

Here I am assuming that you already know how to use Power View. I have covered it briefly in this post – Working with Excel data? There is a revolution happening. Learn Power BI.

I will cover more about Power View in upcoming posts.

graphs

Now we want to do textual analysis. To do so, you add the Message to Filter Area and choose the other option Advanced Filter Mode

image

Now you can type the text you are looking for and press Enter. This filter is immediately applied to all the charts. No refresh is required.

image

This is the same dashboard showing only those posts where the word Test was contained.

image

Practical Usage Scenarios

Now that we know the technique behind it, let us think where we can use this in practice. Of course there are two type of groups – informal and formal.

Informal groups are formed between friends, co-workers, alumni, people who have come together due to some common interest / cause. They can perform such analysis to find out participation, contribution, etc. as shown above.

For formal groups, analytics can be even more useful.

  • For example, it can be used for quick auditing to find a particular context.
  • Another useful thing is to get all the messages by a particular person in a specific time period.
  • Often URLs or references are posted. Searching by “http” can be very useful here.

Summary

  • Using Power Query simplified the text processing. It was eminently better than using Text Import Wizard.
  • The effort put in for cleaning up the data is only a One Time Effort. Next time, all the work will be done by Power Query.
  • Now we can focus on the analysis rather than wasting time on the clean up.

Key Learning: Next time you are using a traditional clean-up method, think if there is a better alternative available.

Clock

11 Responses

  1. Interesting BUT I could not do it for my chat conversation.

    My chat format is different.
    9/12/13, 6:47:47 AM: XYZ: Text
    9/12/13, 12:41:11 AM: ABC: Text1

    I could split that however if you can suggest better way. but when somebody sends some joke,quote then it spills to multiple lines

    9/12/13, 2:04:56 AM: XYZ: A young unmarried girl Joke.
    Scared??..She Joke.

    How to handle that.

  2. Hi,

    We use Whatsapp for 200 customers. Each has his own chat group with our customer service agents.
    I need to analyze individual customer’s chat statistics,like time lag between chat start to agent response,total chat duration,time of day and day of week traffic etc
    Can you please advise on how to do this?
    Thanks
    Naseer mahmood

    1. Hi
      Parsing can be done using one of the approaches shown in the blog.
      After that, detecting whether the statistics you want will require some lexical analysis.
      Unfortunately, each transcript will have to be parsed one by one.
      Potentially, we could copy paste all the transcripts in one large file and have an additional column to prefix the customer name so that once parsed, it shows up as another column. But I have not tried it myself.
      To my knowledge, there is no way to get all the transcripts in a single file in WhatsApp.

    1. Thanks for posting. Looks good. I have not tried it. But worth trying (at your own risk, of course).

Queries | Comments | Suggestions | Wish list