Textual Variants

Share this content

One of the more frequent requests I get is how to convert a text file into viable Excel data that can be pivoted? There are a wide variety of text files. I call them textual variants, and there are different methods used to refine the file depending on the variant. The first example is a common one. This is an accounts receivable aging file. The basic problem here is that a header line appears first which identifies the customer name. All of the other pertinent information (date, invoice number, amount, etc.) is contained in subsequent lines. Then there is a break and the cycle repeats itself. This file has to be manipulated so the customer name is on the same line as the rest of the data in order to make a data base which is suitable to pivot. A manual insertion of a column and manual entry of the customer name followed by copying the name down through all of the data rows is one option, but this is not a good option if the file is thousands of rows deep. Usually these files can be easily adapted using a combination of text functions, IF functions and/or the special paste feature in the “Go To” dialog box.

Here is how the file appears when it is imported into Excel (click to enlarge):

Figure 2 (click to enlarge):

The key is to find an identifier. The identifier is some unique feature be it a format, symbol, blank, etc. that allows the user to differentiate something in one row from another row in the same column. Now find a unique feature about row 4 (data row) that is not contained in row 3 (header row). Examine column A first. This column contains the customer number for the header row and an invoice number for the data row. Notice that these rows are not unique. The customer name can be either a number such as cells A3 and A22 or a label such as cells A7 or A11. It could be that all invoices begin with an “8” whereas no customer numbers begin with an “8”. Then the first digit of the number in column A might be a viable option to use as the “identifier”, but we don’t have enough information from this screen shot to know if this is true. Now look at columns B, C & D. Notice that the header row always contains a label and the data row always contains a number or a date, and a date is a number in Excel. Any one of these columns would make a perfect “identifier”.

Now insert a column before column A and write an IF function. The syntax of an IF function is to first perform a logical test i.e. a function that is answered by true or false. If true, the IF function will perform the next function (part 2 of the IF) and ignore the last function (part 3 of the IF). If false the IF function will ignore the next function (part 2 of the IF) and perform the last function (part 3 of the IF).

Click to enlarge:

Now copy the IF formula down to the bottom of the data, and see what happens.

When copied using relative references the formula in cell A4 became =IF(ISNUMBER(D4),A3,C4&D4)
It found a number in cell D4 and returned the value in cell A3, namely ALAMO GROUP.
Then the formula in cell A6 did not find a number and returned the values in cells C6 & D6, blanks.
Likewise the formula in cell A7 found a label and returned ALEXANDRIA TUBES, and so on.

The effect is to record the customer name on each row that contains the rest of the data.

Now it’s time to set up the column headers. There is some more work to do as the row headers and blank lines have to be deleted.

It is a good idea to copy this sheet to preserve the procedures and formulas that have been mastered for this particular text file. Now the formula can be easily accessed from Nov(1). AR Aging (1) and copied into next month’s file.

First, highlight the IF functions in column A and paste special values so that the formulas are erased and the text remains.

Next, sort the data. Here I used invoice, and I was able to isolate the row headers from the row data.

Delete the extraneous row headers and blanks. Do not delete the totals until the remaining data columns have been summed and compared to the totals. This is a good check as sometimes there are nuances and blips in the data that can cause some information to be lost or accidentally deleted.

Viola, The result is a perfect data base just waiting to be pivoted.

Well the temperature hit zero degrees here yesterday. With the wind chill it seemed like about 150 below, and as my brother in Maine said “It’s wicked cold”, so I’m headed to sunny Florida tomorrow for a brief respite from the bitter weather. I want to wish all my readers a save and happy holiday season. Rest assured, there’s more Excel excitement to come in 2009.


Please login or register to join the discussion.

There are currently no replies, be the first to post a reply.