Tags:

How to Resolve Duplicate Data within Excel Pivot Tables

Feb 13th 2014
Share this content
Spreadsheets and graphs on a desk
xfgiro/istock

An attendee from my recent pivot table webinar posed a question that I hadn’t encountered before.

Pamela had an issue where some, but not all, items within her pivot table were being duplicated, with two different totals. If you’re new to pivot tables, you can catch up by watching a free recording of the webinar.

In this article I’ll explain how I helped Pamela track down and resolve a nuance within her data. Although Pamela’s data set was much larger, I only need the two columns of data shown in Figure 1 to illustrate what she was experiencing. There’s a hidden aspect to this data that I’ll reveal in a moment, but let’s first create a pivot table from this data:

Figure 1: I’ll use this data set to explain why duplicate data may appear within a pivot table.

  • Excel 2007 and later: As shown in Figure 2, click on cell A1, choose Insert, Table, and then click OK. Click Summarize with Pivot Table from the Design tab, and then click OK.
  • Excel 2003 and earlier: Choose Data, List, Create, and then click OK. Next, choose Data, Pivot Table Wizard, and then click Finish.

Figure 2: Carry out the steps shown to create a pivot table.

The Table (List in Excel 2003) feature greatly improves the integrity of pivot tables in Excel. If you add additional rows or columns to your data set, the pivot table will instantly reflect the additional information when you refresh. This means you won’t inadvertently exclude data from your analysis, plus you won’t have to ever manually resize a pivot table’s source range.

As illustrated in Figure 3, add data to your pivot table:

  • Excel 2007 and later: Click the checkboxes for Account and Amount to add these items to the pivot table.
  • Excel 2003  and earlier: Drag these field names into the Row Labels and Data sections, respectively.

You’ll see in my case that account 4000 appears twice on the pivot table, with two different amounts. The goal of a pivot table is to represent one of each item, so why would account 4000 appear twice? The answer lies within the individual cells of my data.

Figure 3: When you add the fields to the pivot table, account 4000 appears twice.

The first step in solving my problem is to determine if my account numbers are stored consistently. To determine this, I’ll use the ISNUMBER function. As shown in Figure 4, in cell C2 add this formula, and then copy it down the column:

=ISNUMBER(A2)

Figure 4: The ISNUMBER function signifies if numbers are stored as text or values.

Notice that most rows show TRUE, but in two instances it shows FALSE. This signifies that the account numbers on rows 3 and 7 are stored as text. This nuance explains why Pamela’s pivot table was reflecting duplicate data. 

If you wish to recreate this scenario, add an apostrophe before one or more account numbers to convert the number to text. Make sure that you leave at least one instance of the account number as a value. Right-click on your pivot table and choose Refresh to make the duplicate values appear.

Should you encounter this situation in the future, an easy fix is shown in Figure 5:

In any version of Excel: Select column A, choose Data, Text to Columns, and then Finish. 

Figure 5: The Text to Columns wizard offers the easiest way to convert a text-based numbers to values.

Next return to your pivot table, right-click any cell within it, and choose Refresh. The duplicate values should vanish from your pivot table, as shown in Figure 6.

Figure 6: Duplicate values vanish from the pivot table when all account numbers are stored as values instead of a mix of text and numbers.

Tags:

Replies (5)

Please login or register to join the discussion.

avatar
By William
Jun 26th 2015 01:11

Wow great info about Resolving issue of Duplicate Data within Excel Pivot Tables, thanks :)

http://www.education-institute...

Thanks (0)
avatar
By Ralph Pawne
Jun 26th 2015 01:11

I've come across a great tool called DataMatch by Data Ladder (http://DataLadder.com), which is an excellent fuzzy matching and deduplication tool used across business and would work really well for this situation. They offer a complimentary trial for new users.

In fact, an independent verified evaluation was done of the software comparing it to major software tools by IBM and SAS. There was a study done at Curtin University Centre for Data Linkage in Australia that simulated the matching of 4.4 Million records. It identified what providers had in terms of accuracy (Number of matches found vs available. Number of false matches)

1.DataMatch Enterprise, Highest Accuracy (>95%), Very Fast, Low Cost

2.IBM Quality Stage , high accuracy (>90%), Very Fast, High Cost (>$100K)

3.SAS Data Flux, Medium Accuracy (>85%), Fast, High Cost (>100K)

Thanks (0)
avatar
By Anselm
Apr 10th 2017 17:44

My pivot table apparently arbitrarily splits the same data into two columns. In the column labelled "Faculty" in the data, for example, the value "All" appears 22 times, but the pivot table randomly splits these into two columns, with 20 appearances in one and two in the other. I've used the =ISNUMBER function to check every cell with that value in it. All return "FALSE", meaning, if I understand correctly, that the value in each cell is stored as text - which it should be.

Any ideas what's going wrong?

Thanks (0)
Replying to Anselm:
avatar
By Tenordata
Jan 18th 2018 23:07

I had a similar issue. Two instances of "Record Update" showed up with different totals. The way I fixed it is, I sorted the entire table by the column containing "Record Update"; then I copied one of the instances of "Record Update" and pasted it down over the other instances--making them all the same. I'm not sure what the discrepancy was, but that fixed it.

Thanks (0)
Replying to Anselm:
avatar
By Tenordata
Jan 18th 2018 23:07

Just fixed another instance: "Action Update" occurred twice with different totals. It turns out one of them had a single space before the word "Action." Taking out the space fixed it. Similarly, we should probably check for spaces AFTER the ends of words/phrases.
Edit: Yep, just fixed another and the issue was: "Spaces AFTER the word."

Thanks (1)