Yes. You read it correctly. Everyone does it. But still.It is very dangerous.
DO NOT do it.
Read on to find out why.
Estimated reading time 7 min
Contents
The Problem
CSV files are commonly imported in Excel.
Never use CSV files. Importing CSVs is dangerous and error prone.
The Solution
The correct approach is to rename the CSV as a TXT file and then import into Excel using the import dialog box.
Why?
What is wrong with CSV files? The reason is simple.
CSV is imported DIRECTLY by Excel.
You do not get any chance to control the import process. This can lead to many serious mistakes. The commonest mistakes are related to dates.
TXT files are better because Excel shows the IMPORT dialog by default.
Now you can customize the import and eliminate many common mistakes.
What’s more, you can automate a complex import by recording a macro.
What’s more, for repetitive processing you can easily record a macro of all the custom import actions.
Quick Summary
SOP (Standard Operating Procedure) for importing comma delimited text data in Excel.
- Rename CSV as TXT
- Open TXT file in Notepad (switch off Word Wrap)
- Remove page header, if any
- Open the TXT in Excel using File – Open
- The import dialog appears
- Handle each column
- Mark unwanted columns as Do Not Import
- Mark padded zero columns as Text
- Make sure that numbers do not have comma separators
- Mark date columns as date and choose correct format
Finally, two important best practices:
- Columns containing dates should never be fitted to width. Make them wider. This way, you can detect mistakes of unrecognized dates (left aligned)
- Date format should always show month in MMM format so that nobody misinterprets the date and month. EVER. Make that a standard operating procedure for your company.
Problems!
Excel automatically imports CSV format files into a new workbook. The import is done using default settings. You do not get any control over the import process. Due to this, many common mistakes occur:
- Most often, we end up importing more columns than we need. This leads to large files and reduces performance
- The imported dates may be completely wrong (details below)
- Leading zeros may be lost
- Blank and unwanted columns need to be cleaned up manually
Unfortunately, CSV is the most common format which is used by IT to deliver DUMPs for users.
The solution
The solution is simple. Just rename the file to TXT extension.
Excel does not understand the TXT extension by default. Therefore, it shows you the Import dialog.
This offers full control over the import process and prevents common mistakes. Starting with accurate, high quality data is the basis of effective analytics.
TXT import best practices
What to do while importing:
- Open the text file in Notepad (wrap off) and view the overall structure
- If you see date columns, look for the pattern and find something which has some portion above 12. That helps us find out the base date format in the file.
- In the Import dialog – Delimited type scroll down so that you can see real data (avoid headers)
- If Comma is the delimiter, remove the TAB option (it is ON by default). If you don’t do this, some random tab appearing in descriptive fields will add unwanted column.
- Click on each unwanted column and choose Do Not Import This Column (contiguous columns can be multi-selected with Shift – Click
- Choose columns with padded zeros and mark them as Text
- Choose Date columns and choose Date option and the correct format of date.
- Make sure commas in numeric columns are NOT considered as delimiter.
- Understand the meaning of the option Treat Consecutive delimiters as one
In the next article, we will see some how dates can be misinterpreted due to this CSV problem. In fact the real problem is not even noticed in this case. Very scary.
***