fbpx

Excel Data Clean-up: System Reports

Data Clean-up: One of the worst formats to get raw data is the system reports. These are reports you generate from business automation software. Export to Excel / CSV option dumps the report into a file instead of a printer. But the report is originally designed for printing. Now you waste too much time removing unwanted headers and footers.

This article shows a quick method, and more importantly, a concept which can be used in various clean-up scenarios as well.

Data Clean-up

Typical report format

Consider this output from a business report – exported to Excel. The areas in pink are page headers and footers. Usually we need the tabular data. We don’t need the page headers and footers.

report format

The wrong way

Most of us either do this manually, deleting few rows at a time – repeatedly. This is inefficient.

Recording a macro is also inefficient because in effect you are automating inefficiency.

Don’t do it.

Slightly better way

Sort or auto-filter also works. Sort is bad because in many cases the original order of the transactions or data is important. Besides, while sorting if the correct range is not included, it can lead to confusion.

Filter may work in some cases. But filter only removes unwanted rows. In some cases, you may have unwanted columns.

In case of very large data, filter as well as sort can take up a very long time.

Both methods do not remove the repeated tabular header.

The right way: Using GoTo Special

The objective is to keep the needed rows (which contain tabular data) and remove unwanted rows (those which are part of header / footer / subtotals / etc.).

In this case, the text file was imported into Excel, however this concept will work with data which was directly copy pasted from system reports (web pages) as well.

Here are the steps. First of all, you must scroll the data and eyeball it (view the pattern).

excel data

Now try to find a column which is useful to us. Useful means what? If that column has an empty cell – it should identify unwanted row. If there is a value in that column cell, it should identify a row we want to keep. In short, the column is our keep / remove row decision flag.

In this case, Column D seems to be suitable. Scroll and check if your assumption is right. There may be many columns which are useful. But as long we find one such column, there is no need to search for additional columns. This is the column you would have used in AutoFilter.

Select the entire Column D. Now we want to select only the blank cells in it so that we can delete those rows. This is done by using GoTo -Special – Blanks.

 

Press Ctrl G (or F5) and choose Special – Blanks – Ok.

Go To special

Now all blanks are selected in column D.

exce sheet

Right click carefully within any of the selected cells and choose Delete… Delete Entire Row. Now all unwanted rows are gone in one stroke. For large data, this operation may take significant amount of time. So make sure all other applications are closed and this file gets maximum memory resources.

delete row

Deleting unwanted columns

This can be done using the same method. Find a ROW which contains blank cells in unwanted columns. GoTo Special – Blanks – Delete Entire Column.

image

Removing unwanted headers

The header for columns will repeat on each page. This can be removed using Find feature.

Choose any column. In this case I am choosing the the first column where the header contains the title “Vendor”.

Press Ctrl F to open Find dialog and choose Find All.

find replace

Now you see the find results below. Click inside the results and press Ctrl A to select all of them. We want to preserve the first row header. So search for the first row and press CTRL Click (Press Ctrl key on keyboard, click on that search result). This will Unselect only that row from the find results.

find and replace

Now the Find dialog has done its job – of identifying (selecting) the unwanted header rows.

Carefully right click in one of the selected cells and choose Delete – Delete Entire Row.

Check the data again to ensure that there are no mistakes.

Now our job of cleaning data is done!

Summary

This approach can be used as a best practice for most report cleaning tasks. Of course, you will need to refine the approach depending upon patterns specific to your data. But this concept will help you in finding the most optimal method.

  1. Understand the pattern of data
  2. Find a column which contains empty cells for unwanted rows.
  3. Remove those rows using GoTo Special Blanks
  4. Find a row where empty cells indicate unwanted columns
  5. Remove those columns using GoTo Special Banks
  6. Select any column and search on header caption
  7. Select all captions except the first one and remove those rows

Island with a palm tree

One Response

Queries | Comments | Suggestions | Wish list