During the recent High Impact Excel: VLOOKUP Edition webinar a senior financial analyst named Lisa asked a question related to Social Security numbers. She periodically receives employee lists where the Social Security numbers aren’t necessarily in a uniform format. For instance, some are all numeric, others have dashes, and some might be text-based. Any data analysis involving look-up functions in Excel requires that our data be clean. Here are some techniques for cleaning up the numbers.
The first step in cleaning up data involves eliminating extraneous characters, such as dashes. An easy way to do so is to select the column of data, and then press Ctrl-H to display the Replace dialog box in Excel. As shown in Figure 1, enter a dash in the Find What field, and then leave the Replace With field blank. Make sure you’ve selected a single column of data, and then click Replace All.
Figure 1: The Replace dialog box makes it easy to eliminate dashes from text-based Social Security numbers.
Next, we could encounter a mix of value and text-based Social Security numbers. Text-based numbers often frustrate Excel users, but there are several ways to identify numeric data stored as text:
- Numbers will be left-aligned, although it is possible to left-align numeric values as well.
- As noted previously, text-based numbers may contain dashes, although at the end of this article I’ll describe how you can format numeric Social Security numbers with dashes.
- Cells that contain text-based numbers may also contain spaces.
- Social Security numbers sometimes have the Text number format applied. If so, the word "Text" will appear in the Number section of Home tab in Excel 2007 and later.
- Cells that contain numbers stored as text often have an apostrophe preceding the number itself.
The simplest way to convert text-based numbers to values is to use the Text to Columns wizard, as shown in Figure 2. In any version of Excel, select a range of text-based numbers, choose Data, select Text to Columns, and then click Finish. In this context there’s no need to complete the wizard – when you click Finish the default action is to convert text-based numbers to values. Be sure to eliminate dashes from the Social Security numbers first, otherwise Text to Columns won’t provide the results that you want.
Figure 2: The Text to Columns wizard makes it simple to convert text-based numbers to values.
Text to Columns makes all of the Social Security numbers become numeric, but at the expense of removing our leading zeroes. There are two techniques that I often use to add the dashes and leading zeroes back:
- Technique No. 1: Social Security number format: As shown in Figure 3, press Ctrl-1 to display the Format Cells dialog box. Choose Special, and then double-click on Social Security number. The values in the worksheet cells are numbers, but Excel displays them as if they were text with dashes in between the three sets of numbers.
Figure 3: You can choose Social Security number from the Format Cells dialog box.
- Technique No. 2: TEXT Function: Although the Social Security number format is easy to apply, the downside is that the conversion from text to numbers was at the expense of any leading zeroes. This means that although the formatting makes the Social Security numbers appear that they have leading zeroes, in reality the zeroes have been dropped. Further, this means that we can’t use VLOOKUP, MATCH, SUMIF, and other similar functions in Excel to compare a list of numeric Social Security numbers to a text-based list. Fortunately we can easily have the best of both worlds. As shown in Figure 4 we can use Excel’s TEXT function to convert the numeric Social Security numbers to text, which will restore the missing leading zeros.
The TEXT function has two arguments:
- Value – This is a numeric value, or a reference to a cell that contains a numeric value
- Format_Text – The number format code that you wish to apply
As shown in Figure 4, cell A1 contains the Social Security number 001-02-0003. If we were to click on the cell then Excel’s Formula Bar would show 1020003. We can use the TEXT function to restore the dashes and leading zeros:
Figure 4: The TEXT function converts numeric values to text by way of formatting that you assign.
In case you’re not sure how to construct a format code, Excel offers an easy cheat-sheet. For instance, in Figure 3 above we applied the Social Security number format. You can now ascertain the underlying format code:
- Press Ctrl-1 to display the Format Cells dialog box.
- Choose Custom on the Number tab.
- As shown in Figure 5 the Type field will reveal the format code, which you can copy and paste when you’re constructing the TEXT function.
Figure 5: The Format Cells dialog box can reveal format codes for use with the TEXT function.
About the author:
David H. Ringstrom, CPA, heads up Accounting Advisors, Inc., an Atlanta-based software and database consulting firm providing training and consulting services nationwide. Contact David at email@example.com or follow him on Twitter. David speaks at conferences about Microsoft Excel, teaches webcasts for CPE Link, and writes freelance articles on Excel for AccountingWEB, Going Concern, et al.