Duplicate data can be a common issue in any database, and it's important to know how to effectively remove it. In this article, we will discuss different options for deleting duplicate rows in SQL. We will also provide SQL code examples to help you implement these solutions in your own database.
Option 1: Using the GROUP BY clause and the HAVING COUNT() > 1 condition
This is the most basic method for removing duplicate rows in SQL. The GROUP BY clause groups rows in the table by one or more columns, and the HAVING COUNT() > 1 condition filters out groups with a count greater than one. Here's an example of how this method can be used:
DELETE FROM table_name
WHERE (col1, col2) IN (SELECT col1, col2
FROM table_name
GROUP BY col1, col2
HAVING COUNT(*) > 1);
Option 2: Using a Common Table Expression (CTE)
A Common Table Expression (CTE) is a temporary result set that can be used within a SELECT, INSERT, UPDATE, or DELETE statement. This method is similar to the first one, but it allows you to see the duplicate rows before deleting them. Here's an example of how this method can be used:
WITH cte AS (
SELECT col1, col2, ROW_NUMBER() OVER(PARTITION BY col1, col2 ORDER BY col1) as row_num
FROM table_name
)
DELETE FROM cte
WHERE row_num > 1;
Option 3: Using the EXISTS clause
The EXISTS clause is used to check for the existence of rows in a subquery. This method is also similar to the first one, but it can be more efficient for large tables. Here's an example of how this method can be used:
DELETE FROM table_name t1
WHERE EXISTS (SELECT 1
FROM table_name t2
WHERE t1.col1 = t2.col1
AND t1.col2 = t2.col2
AND t1.id > t2.id);
Option 4: Using the DELETE JOIN
This method is similar to the third one, but it uses a JOIN instead of a subquery. Here's an example of how this method can be used:
DELETE t1 FROM table_name t1 JOIN table_name t2 ON t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND t1.id > t2.id;