Home » Analysis » Excel Tables 5 – Good Data Checklist

Excel Tables 5 – Good Data Checklist

Now it is time to spell out exactly what to look for when creating good raw data (or repairing bad input data). Use this as a checklist to quickly classify data into GOOD or BAD (requires clean-up).

Photo credit: Flyinace2000 / Foter / CC BY-SA

Contents

Good input data must satisfy these criteria

Each column must have a heading (not data)
No blank headings
No duplicate headings
No formulas in headings
No merged cells
Only one meaning in each column
Same data type in a column
No Horizontal Calculations

Creating good data

When you are about to capture new data, we decide the columns we need and put headings for them. Often, we start with few headings and then type / gather data. During the process, we may realize that we need some more columns. We then add more columns.

This process is fairly ad-hoc. At this stage lot of mistakes can happen which will lead to poor quality raw data. This will haunt you throughout the data analytics phase.

Here are important precautions you must take while deciding how to capture data.

The key thing to remember is that one column must have one meaning.

One Column = One Meaning

We often forget this requirement. Therefore, we have columns like this:

Obviously, there are two things being described, the material and the size. So we need two columns.

Sometimes, it is so subtle that we don’t realize it.

That * is additional meaning. Give it another column called Items in Stock.

Hierarchy in one column

Another common mistake – especially with financial data.

Income and Expenses are categories. They deserve separate columns. March 12 and Mar 11 are DATA – NOT HEADINGS. So GOOD data should look like this.

Yes, you will see lot of repetition in the Category column.

How to avoid repetition?

Repetition in the Category column above indicates that we need two tables.

One table should be actual data from Subcategory onwards. And another table which contains just the Sub-category and Category Mapping.

Like this:

Now you will say, what is the point? In the Pivot Table, we have to show Category anyway. Pivot Table cannot pick up data from two separate tables like this. Therefore, you are forced to put another VLOOKUP column in the green table to get the Category from the orange table – which finally gives you the original table we started with.

Agreed. But that was with older version of Pivot Table. With Power Pivot Data model, such VLOOKUP is no longer required. Why? Because Power Pivot CAN use multiple tables.

With Power Pivot Data Model, we can create relationship between two tables and use the Category column from a separate table. This is how we eliminate duplication.
Read this article for details : Using Power Pivot instead of VLOOKUP

Ideally, to make the data smaller, each subcategory can be given a smaller numeric ID column. And the orange table should have ID, Subcategory and Category columns.

When there is repetition,
think whether you need to split data into multiple tables

This is an important point to keep in mind.

What next

In the next article, we will explore various advantages of converting GOOD RAW DATA into Excel Tables.

This article is a part of a series: Knowledge Pack – Excel Tables

***

Data, Data Model, Excel, Power Pivot, vlookup

3 Responses

Venkataraman says:

June 3, 2015 at 9:52 am

Thanks Sir. Very useful information. You can also mention about normalization when designing tables for excel, as you have touched the topic of normalization of tables.

Loading...

Reply
1. Dr Nitin Paranjape says:
  
  June 3, 2015 at 10:03 am
  
  Yes. i don’t like to use unnecessary tech words.
  
  Loading...
SutoCom says:

June 8, 2015 at 7:05 pm

Reblogged this on SutoCom Solutions.

Loading...

Reply

Queries | Comments | Suggestions | Wish listCancel reply

Subscribe to Blog

Join 1,747 other subscribers

Pivot Table Pro Course

Yes. You use Pivot Tables everyday. Now it is time to find out the real power and nuances. 5.5 hours video, exercises, samples, Q&A.

Excel to Power BI Course

Learn Power BI using the concepts you already know in Excel. Fast transition, in-depth coverage and immediately usable.

Excel Tables 5 – Good Data Checklist

Photo credit: Flyinace2000 / Foter / CC BY-SA

Good input data must satisfy these criteria

Creating good data

One Column = One Meaning

Hierarchy in one column

How to avoid repetition?

When there is repetition,
think whether you need to split data into multiple tables

What next

Related Articles

3 Responses

Queries | Comments | Suggestions | Wish listCancel reply

Subscribe to Blog

Popular articles

Pivot Table Pro Course

Excel to Power BI Course

Excel Tables 5 – Good Data Checklist

Photo credit: Flyinace2000 / Foter / CC BY-SA

Good input data must satisfy these criteria

Creating good data

One Column = One Meaning

Hierarchy in one column

How to avoid repetition?

When there is repetition, think whether you need to split data into multiple tables

What next

Related Articles

3 Responses

Queries | Comments | Suggestions | Wish listCancel reply

Subscribe to Blog

Popular articles

1100+ in-depth blog articles

Pivot Table Pro Course

Excel to Power BI Course

When there is repetition,
think whether you need to split data into multiple tables