Cleaning Up Bad SQL Data: Find and Delete Duplicate Records The Right Way
Few things can derail a developer's day quite like finding bad data in a live production database. Especially when it isn’t immediately obvious what the problem is. You’re POSITIVE your code is correct, but you keep getting unexpected results. So you dig deeper through the layers until you finally find the culprit: duplicate records.

I had one of those moments recently. A feature in my side project McAllen Garage Sales was behaving erratically, and after sinking way too much head-scratching and debugging, I finally uncovered the problem: my Listings table had multiple entries with the same PermitNumber, a value that should have been absolutely, positively unique.
My first reaction was to blame myself (which was fair, I should have had a unique constraint on that column from day one!). My second reaction was, "Okay, how do I fix this without making things even worse?" Just randomly deleting rows, while probably satisfying, wouldn't guarantee I got them all or didn't remove the wrong ones.

I needed a safe, repeatable way to identify only the duplicates and then surgically remove them.
After a bit of trial and error, I finally landed on a simple and safe script using a Common Table Expression (CTE), which if you haven’t heard of, is basically just a fancy abbreviation for “temp table”!
Let's take a look!
Step 1: Safely Identifying the Duplicates
Before I even thought about running a DELETE statement, my first priority was to build a SELECT query that would show me exactly what I was dealing with. Running a DELETE on a production table without a WHERE clause, while certainly a developer’s rite of passage, isn’t something anybody has time for.

My goal was to list all rows that had a duplicate PermitNumber. This is where a CTE comes in handy.
A CTE, defined using the WITH keyword, lets you create a temporary, named result set that you can reference in your main query. It’s perfect for breaking down a complex problem into logical steps.
Here’s the script I used to find the problem rows:
SQL
WITH DuplicateRows AS (
SELECT PermitNumber,
COUNT(*) AS DuplicateCount
FROM Listings
GROUP BY PermitNumber
HAVING COUNT(*) > 1
)
SELECT yt.*
FROM Listings yt
INNER JOIN DuplicateRows dr
ON yt.PermitNumber = dr.PermitNumber;
Let's break down what's happening here, because it's a super useful pattern:
WITH DuplicateRows AS (...): This defines our CTE and gives it the nameDuplicateRows.SELECT PermitNumber, COUNT(*) AS DuplicateCount: Inside the CTE, we select the column that has duplicates,PermitNumber. We also useCOUNT(*)to count how many times eachPermitNumberappears.FROM Listings GROUP BY PermitNumber: This is the key. We group all the rows by theirPermitNumber. So, all rows withPermitNumber'ABC-123' are now in one group.HAVING COUNT(*) > 1: After grouping,HAVINGfilters those groups, keeping only the ones where the count is greater than one and we now have a list of everyPermitNumberthat is a duplicate.SELECT yt.* FROM Listings yt INNER JOIN ...: Finally, the main query joins the originalListingstable back to ourDuplicateRowsCTE. This gives us the full data for every single row that is part of a duplicate group.
Running this gave me the confidence that I was correctly targeting only the rows that needed to be cleaned up.
Step 2: Deleting the Duplicates
With my SELECT statement perfected, modifying it to perform the DELETE was surprisingly straightforward. I essentially used the exact same CTE and join logic.
Here is the final DELETE script:
SQL
WITH DuplicateRows AS (
SELECT PermitNumber,
COUNT(*) AS DuplicateCount
FROM Listings
GROUP BY PermitNumber
HAVING COUNT(*) > 1
)
DELETE yt FROM Listings yt
INNER JOIN DuplicateRows dr
ON yt.PermitNumber = dr.PermitNumber;
The only real change is swapping SELECT yt.* for DELETE yt!
By joining the Listings table (aliased as yt) to our DuplicateRows CTE, the DELETE statement knows to remove only those rows from Listings whose PermitNumber exists in the CTE.
I held my breath, ran the query, and just like that, the duplicates were gone!

A quick re-run of the SELECT query from Step 1 returned zero rows, confirming the fix was successful.
Me, running a DELETE statement on a production table, 2025 (colorized)
Wrapping Up and Next Steps
So, we fixed the immediate problem. The data is clean, and the site is back in business! The pattern of using a CTE with GROUP BY and HAVING is a simple, helpful, and, more importantly, safe way to first identify and then act on duplicate data. You build the SELECT first to verify your logic, then swap in the DELETE.
But we're not done. The real lesson here is about prevention. How do we stop this from ever happening again?
Enforce Uniqueness at the Database Level: This is the most crucial step. The database should be the ultimate source of truth and integrity. I immediately ran a script to add a unique constraint to the
PermitNumbercolumn. This makes it impossible for the database to accept another row with an existingPermitNumber.SQLALTER TABLE Listings ADD CONSTRAINT UQ_Listings_PermitNumber UNIQUE (PermitNumber);Enforce Uniqueness at the Application Level: The database constraint is your last line of defense, but the first line should be your application code. Before attempting to insert a new
Listing, the code should first check if a record with thatPermitNumberalready exists. This provides a better user experience by catching errors early instead of waiting for the database to reject it.
Fixing mistakes is part of the job, but learning from them is the key to mastering our craft! I definitely won't be forgetting to add a UNIQUE constraint next time!

Until then, as always, thanks for reading and I hope this was helpful!