How to Resolve Duplicate Data within Excel Pivot Tables

Share this content

An attendee from my recent pivot table webinar posed a question that I hadn’t encountered before.
Pamela had an issue where some, but not all, items within her pivot table were being duplicated, with two different totals. If you’re new to pivot tables, you can catch up by watching a free recording of the webinar.
In this article I’ll explain how I helped Pamela track down and resolve a nuance within her data. Although Pamela’s data set was much larger, I only need the two columns of data shown in Figure 1 to illustrate what she was experiencing. There’s a hidden aspect to this data that I’ll reveal in a moment, but let’s first create a pivot table from this data:
Figure 1: I’ll use this data set to explain why duplicate data may appear within a pivot table.
  • Excel 2007 and later: As shown in Figure 2, click on cell A1, choose Insert, Table, and then click OK. Click Summarize with Pivot Table from the Design tab, and then click OK.
  • Excel 2003 and earlier: Choose Data, List, Create, and then click OK. Next, choose Data, Pivot Table Wizard, and then click Finish.
Figure 2: Carry out the steps shown to create a pivot table.
The Table (List in Excel 2003) feature greatly improves the integrity of pivot tables in Excel. If you add additional rows or columns to your data set, the pivot table will instantly reflect the additional information when you refresh. This means you won’t inadvertently exclude data from your analysis, plus you won’t have to ever manually resize a pivot table’s source range.
As illustrated in Figure 3, add data to your pivot table:
  • Excel 2007 and later: Click the checkboxes for Account and Amount to add these items to the pivot table.
  • Excel 2003  and earlier: Drag these field names into the Row Labels and Data sections, respectively.
You’ll see in my case that account 4000 appears twice on the pivot table, with two different amounts. The goal of a pivot table is to represent one of each item, so why would account 4000 appear twice? The answer lies within the individual cells of my data.

Figure 3: When you add the fields to the pivot table, account 4000 appears twice.

The first step in solving my problem is to determine if my account numbers are stored consistently. To determine this, I’ll use the ISNUMBER function. As shown in Figure 4, in cell C2 add this formula, and then copy it down the column:
Figure 4: The ISNUMBER function signifies if numbers are stored as text or values.
Notice that most rows show TRUE, but in two instances it shows FALSE. This signifies that the account numbers on rows 3 and 7 are stored as text. This nuance explains why Pamela’s pivot table was reflecting duplicate data. 
If you wish to recreate this scenario, add an apostrophe before one or more account numbers to convert the number to text. Make sure that you leave at least one instance of the account number as a value. Right-click on your pivot table and choose Refresh to make the duplicate values appear.
Should you encounter this situation in the future, an easy fix is shown in Figure 5:
In any version of Excel: Select column A, choose Data, Text to Columns, and then Finish. 

Figure 5: The Text to Columns wizard offers the easiest way to convert a text-based numbers to values.

Next return to your pivot table, right-click any cell within it, and choose Refresh. The duplicate values should vanish from your pivot table, as shown in Figure 6.

Figure 6: Duplicate values vanish from the pivot table when all account numbers are stored as values instead of a mix of text and numbers.


Please login or register to join the discussion.

Wow great info about Resolving issue of Duplicate Data within Excel Pivot Tables, thanks :)

I've come across a great tool called DataMatch by Data Ladder (, which is an excellent fuzzy matching and deduplication tool used across business and would work really well for this situation. They offer a complimentary trial for new users.

In fact, an independent verified evaluation was done of the software comparing it to major software tools by IBM and SAS. There was a study done at Curtin University Centre for Data Linkage in Australia that simulated the matching of 4.4 Million records. It identified what providers had in terms of accuracy (Number of matches found vs available. Number of false matches)

1.DataMatch Enterprise, Highest Accuracy (>95%), Very Fast, Low Cost

2.IBM Quality Stage , high accuracy (>90%), Very Fast, High Cost (>$100K)

3.SAS Data Flux, Medium Accuracy (>85%), Fast, High Cost (>100K)