Cleaning Up Bad SQL Data: Find and Delete Duplicate Records The Right Way

Few things can derail a developer's day quite like finding bad data in a live production database. Especially when it isn’t immediately obvious what the problem is. You’re POSITIVE your code is correct, but you keep getting unexpected results. So you dig deeper through the layers until you finally find the culprit: duplicate records.

DUPLICATION INVESTIGATION

I had one of those moments recently. A feature in my side project McAllen Garage Sales was behaving erratically, and after sinking way too much head-scratching and debugging, I finally uncovered the problem: my Listings table had multiple entries with the same PermitNumber, a value that should have been absolutely, positively unique.

My first reaction was to blame myself (which was fair, I should have had a unique constraint on that column from day one!). My second reaction was, "Okay, how do I fix this without making things even worse?" Just randomly deleting rows, while probably satisfying, wouldn't guarantee I got them all or didn't remove the wrong ones.

DELETE ALL THE THINGS

I needed a safe, repeatable way to identify only the duplicates and then surgically remove them.

After a bit of trial and error, I finally landed on a simple and safe script using a Common Table Expression (CTE), which if you haven’t heard of, is basically just a fancy abbreviation for “temp table”!

Let's take a look!

Step 1: Safely Identifying the Duplicates

Before I even thought about running a DELETE statement, my first priority was to build a SELECT query that would show me exactly what I was dealing with. Running a DELETE on a production table without a WHERE clause, while certainly a developer’s rite of passage, isn’t something anybody has time for.

Ain't Nobody Got Time For That

My goal was to list all rows that had a duplicate PermitNumber. This is where a CTE comes in handy.

A CTE, defined using the WITH keyword, lets you create a temporary, named result set that you can reference in your main query. It’s perfect for breaking down a complex problem into logical steps.

Here’s the script I used to find the problem rows:

SQL

WITH DuplicateRows AS (     
    SELECT         PermitNumber,         
    COUNT(*) AS DuplicateCount     
    FROM         Listings     
    GROUP BY         PermitNumber     
    HAVING         COUNT(*) > 1 
) 
SELECT     yt.* 
FROM     Listings yt 
INNER JOIN     DuplicateRows dr         
ON yt.PermitNumber = dr.PermitNumber;

Let's break down what's happening here, because it's a super useful pattern:

WITH DuplicateRows AS (...): This defines our CTE and gives it the name DuplicateRows.
SELECT PermitNumber, COUNT(*) AS DuplicateCount: Inside the CTE, we select the column that has duplicates, PermitNumber. We also use COUNT(*) to count how many times each PermitNumber appears.
FROM Listings GROUP BY PermitNumber: This is the key. We group all the rows by their PermitNumber. So, all rows with PermitNumber 'ABC-123' are now in one group.
HAVING COUNT(*) > 1: After grouping, HAVING filters those groups, keeping only the ones where the count is greater than one and we now have a list of every PermitNumber that is a duplicate.
SELECT yt.* FROM Listings yt INNER JOIN ...: Finally, the main query joins the original Listings table back to our DuplicateRows CTE. This gives us the full data for every single row that is part of a duplicate group.

Running this gave me the confidence that I was correctly targeting only the rows that needed to be cleaned up.

Step 2: Deleting the Duplicates

With my SELECT statement perfected, modifying it to perform the DELETE was surprisingly straightforward. I essentially used the exact same CTE and join logic.

Here is the final DELETE script:

SQL

WITH DuplicateRows AS (     
SELECT         PermitNumber,         
COUNT(*) AS DuplicateCount     
FROM         Listings     
GROUP BY         PermitNumber     
HAVING         COUNT(*) > 1 
) 
DELETE yt FROM     Listings yt 
INNER JOIN     DuplicateRows dr         
ON yt.PermitNumber = dr.PermitNumber;

The only real change is swapping SELECT yt.* for DELETE yt!

By joining the Listings table (aliased as yt) to our DuplicateRows CTE, the DELETE statement knows to remove only those rows from Listings whose PermitNumber exists in the CTE.

I held my breath, ran the query, and just like that, the duplicates were gone!

hold your breath!

A quick re-run of the SELECT query from Step 1 returned zero rows, confirming the fix was successful.

Me, running a DELETE statement on a production table, 2025 (colorized)

Wrapping Up and Next Steps

So, we fixed the immediate problem. The data is clean, and the site is back in business! The pattern of using a CTE with GROUP BY and HAVING is a simple, helpful, and, more importantly, safe way to first identify and then act on duplicate data. You build the SELECT first to verify your logic, then swap in the DELETE.

But we're not done. The real lesson here is about prevention. How do we stop this from ever happening again?

Enforce Uniqueness at the Database Level: This is the most crucial step. The database should be the ultimate source of truth and integrity. I immediately ran a script to add a unique constraint to the PermitNumber column. This makes it impossible for the database to accept another row with an existing PermitNumber.SQL

ALTER TABLE Listings ADD CONSTRAINT UQ_Listings_PermitNumber UNIQUE (PermitNumber);
Enforce Uniqueness at the Application Level: The database constraint is your last line of defense, but the first line should be your application code. Before attempting to insert a new Listing, the code should first check if a record with that PermitNumber already exists. This provides a better user experience by catching errors early instead of waiting for the database to reject it.

Fixing mistakes is part of the job, but learning from them is the key to mastering our craft! I definitely won't be forgetting to add a UNIQUE constraint next time!

Learn from your mistakes!

Until then, as always, thanks for reading and I hope this was helpful!

SelAromDotNet

Dev

February 09, 2026

Cleaning Up Bad SQL Data: Find and Delete Duplicate Records The Right Way

Step 1: Safely Identifying the Duplicates

Step 2: Deleting the Duplicates

Wrapping Up and Next Steps

Tags:

About Me

Hello World (Again!): Rotating Focus and Reclaiming Creativity

Blog Series

Recent Posts

About