In the previous article we saw the problem with importing CSV files.
Here is how the new Power Query can be used to clean the data while importing.
Estimated reading time 10 min
Contents
Recap
Excel opens TXT files with an import wizard. Unfortunately, it opens CSV files without any import wizard. Therefore, CSV imports are very error prone. We saw how to handle this problem in the earlier article.
Now let us see how Power Query solves the problem and provides more features for data cleanup and import.
Power Query requires Excel 2010 or above. Download it from here and install it.
Using Power Query
Power Query does not differentiate between flat file types. It supports TXT, CSV, XML and Excel formats. It can also import filenames and related properties from a file folder.
Import the data
We will use the same files from the previous article. One is CSV file and other one is TXT. Content in both the files is exactly the same.
TEXT file first
Power Query tab – From File – Text- open our TXT file.
Now it opens in the Power Query window.
No change has been made. Each row contains both the columns.
We must split the column. Right click on the column and choose Split- By delimiter. Choose Comma, remove Tab and click OK. Now it is split.
The columns are split but the data type in the first column is still text.
Where is the data type shown?
First clue is that the data is left aligned – which means it is text.
Power Query also has a direct method of showing the currently interpreted data type. It is shown in the Home Tab – Transform group.
Converting to date
Right click on the column heading and choose Change Type – Date – it does not work.
It creates exactly the same problem we have seen in Excel. Some dates are misinterpreted and some dates remain text.
Now we must undo this step. On the right side pane in Power Query, all steps are shown. Click the cross sign next to the last step Changed Type to undo this action.
What went wrong? It used the current Control Panel Regional Setting which is English (India) – which is DD MM YYY. So the same problem occurred.
Now choose the Change type option but go to the last menu item – Using Locale…
This opens a dialog where you choose the data type and then choose the location / region which matches the date type. In this case we will choose English (USA) and click OK.
(Ideally, you should be able to see the date format for the current locale. I have suggested it to the team. May be they will incorporate it in future versions.)
This will do the expected conversion correctly. Now choose the Close and Load option to get the data into Excel.
Importing from CSV
Now close the file and try importing the CSV file. As you know the data is exactly the same. Only the extension is different. What happened?
It split the data by detecting the comma delimiter. Therefore, we get two columns. But the first column is still interpreted as Text.
Now you will need to apply the Change Type operation as described above to get Date data. That is the only difference.
Of course, in either case, you will need to rename the columns because the column name was not mentioned in the data itself.
Repetitive Import = Refresh
Many of us get a CSV / TXT file periodically – typically on a monthly basis. Most of us have to repeat the process of importing and perform all the transformations all over again.
If you expect a file to come every month, this is what you should do. Create a new folder and copy the current file – say the JAN file there. Rename the file to CURRENT.TXT.
Now import and transform it using Power Query as usual. Next time the FEB file comes up, save it as CURRENT.TXT. Overwrite the earlier current.txt file.
Now open the workbook containing the query and just choose Refresh from the Query tab. That’s it. Power Query will reimport all data. You don’t have to do any manual work now.
***