fbpx

How to desensitize data in Excel

Sometimes we need to share a workbook but not real data.

Here is how you desensitize the workbook quickly.

how to desensitize data


Estimated reading time 7 min

The Need

Often we need to send Excel files for troubleshooting. The files may be sent internally to a helpdesk team or you may engage some external agency.

In either case, the data in the Excel file may be confidential and  you do not want to reveal it to anybody outside your team.

Therefore, you need a way to remove confidential information and replace it with random information. This is called desensitization of data.

When I conduct Efficiency  Optimization work for customers they need to show me the way they work. Once I analyze their work pattern, I optimize it and deliver significant improvement.

It is necessary for me to look at live customer documents. In this case, desensitization of data has to be done by each user.

In this article I will describe a simple but effective method for data desensitization.

Desensitize Data – The Steps

  1. The process has to be applied to all sheets
  2. First of all, make a copy of the file and rename it (put the word “Sample” before the filename)
  3. Open the Copy
  4. Unhide hidden sheets, if any
  5. Look at all available sheets and remove unwanted sheets
  6. Repeat following steps for each sheet
    1. Choose Go To (Ctrl G)

image

  • Click the Special button
  • Choose Constants and unselect Text, Errors and Logicalsimage
  • Click Ok
  • Now all cells will be selected
  • Do NOT disturb the selection. Do NOT click anywhere. Just type.
  • Type the formula
    =randbetween (10,10000)
    and press CTRL and ENTER key together.
    You can use any lower and upper limit values.
  • Now a random number will be added to all the selected cells
    Cells containing dates will get formatted as dates. But we are using a maximum value of 10000. Therefore the years will be in 1900s.
    If the date range matters this approach will not work.
  • Do not try to remove the randbetween formulas. Let them be there.
  • Go to data columns where confidential textual information is seen.
    These could be columns like Customer name, Narration, etc.
    Delete those columns.
    Product and Location names are usually not considered as confidential and therefore can be left alone.
  • The last step is to remove unwanted metadata.
    Go to File – Check for Issues – Inspect Document. Run the Document Inspector and remove unwanted things.

I will write a tool for this purpose at a later date. For now this method works just fine.

 

***

Queries | Comments | Suggestions | Wish list