Remove Duplicate Records from Table in SQL Server

Views: 1876
Comments: 0
Like/Unlike: 0
Posted On: 16-Dec-2015 14:44 

Share:   fb twitter linkedin
Rahul M...
Teacher
4822 Points
23 Posts

Introduction

SQL Server tables should never have duplicate rows, nor non-unique primary keys.Duplicate rows are a violation of entity integrity, and should be disallowed in a relational system. SQL Server has various techniques for enforcing entity integrity, including indexes, UNIQUE constraints, PRIMARY KEY constraints, and triggers.
Despite this, under unusual circumstances duplicate rows may occur, and if so they must be eliminated. One way they can occur is if duplicate row exist in non-relational data outside SQL Server, and the data is imported while row uniqueness is not being enforced. Another way they can occur is through a wrong database design, such as not enforcing entity integrity on each table.
This article shows how to locate and remove duplicate rows from a table.

Problem

Suppose we have a tblCustomer table have some duplicate rows as:

 Customer_id   Customer_name   Customer_gender   Customer_age
 c0023  abc   Male   30
 c0023  abc   Male   30
 c0023  abc   Male   30
 c0050  xyz  Male  45
 c0050  xyz  Male  45
 c0050  xyz  Male  45
 c0022  sdf  FeMale  20
 c0022  sdf  FeMale  20
       

In the above table we see that there are many duplicate rows.

Solution (I)

You can use ROW_NUMBER() with PARTITION BY to delete duplicate.

Use following CTE Query to delete duplicate record from the table tblCustomer:

WITH tmp AS(SELECT Customer_id,Customer_name,Customer_gender,Customer_age,ROW_NUMBER() OVER(PARTITION BY Customer_id,Customer_name,Customer_gender,Customer_age ORDER BY name) AS duplicateCount FROM tblCustomer)
DELETE FROM tmp WHERE tmp.duplicateCount>1

Solution (II)

The first step is to identify which rows have duplicate values:

SELECT Customer_id,Customer_name,Customer_gender,Customer_age, count(*) as duplicatecount
FROM tblCustomer
GROUP BY Customer_id,Customer_name,Customer_gender,Customer_age
HAVING count(*) > 1

 

This will return three row for each set of duplicate rows in the table. The last column in this result is the number of duplicates for the particular row. 

 Customer_id   Customer_name   Customer_gender   Customer_age duplicatecount 
 c0023  abc   Male   30 3
 c0050  xyz  Male  45 3
 c0022  sdf  FeMale  20 2

If there are only a few sets of duplicate rows, the best procedure is to delete these manually on an individual basis. For example:

set rowcount 2
delete from tblCustomer
where Customer_id='c0023' AND Customer_name='abc' AND Customer_gender='Male' AND Customer_age=30

The rowcount value should be n-1 the number of duplicates for a given row. In this example, there are 3 duplicates so rowcount is set to 3-1=2

If there are many distinct sets of duplicate rows in the table, it may be too time-consuming to remove them individually. In this case the following procedure can be used:
First, run the above GROUP BY query to determine how many sets of duplicate rows exist, and the count of duplicates for each set.
Select the distinct rows into a tmpCustomer table. For example:

SELECT DISTINCT * INTO tmpCustomer FROM tblCustomer

At this point, the tmpCustomer table should have unique rows. Verify that each row in tmpCustomer is unique, and that you do not have duplicate rows. For example, the query: 

SELECT Customer_id,Customer_name,Customer_gender,Customer_age, count(*) as duplicatecount
FROM tmpCustomer
GROUP BY Customer_id,Customer_name,Customer_gender,Customer_age
HAVING count(*) > 1

should return a count of 1 for each row.
Delete the duplicate rows from the original table (tblCustomer). For example:

DELETE FROM tblCustomer

Put the unique rows back in the original table. For example:

INSERT tblCustomer SELECT * FROM tmpCustomer
 
Result

Now the tblCustomer table looks as:

 Customer_id   Customer_name   Customer_gender   Customer_age
 c0023  abc   Male   30
 c0050  xyz  Male  45
 c0022  sdf  FeMale  20

Conclusion

In this article we uses simple T-SQL to delete duplicate rows from a table by Solution (I) and Solution (II). Hopefully, this article will help you do that.

0 Comments
 Log In to Chat