DELETE in batches
Microsoft SQL Server supports a DELETE TOP (x) FROM table
construct, which allows you to only delete at most x
rows from a table. This is useful for deleting rows in batches, to keep lock times low and the transaction log small. Postgres does not support this construct, but we can work around this:
delete from table
where id in (
select id
from table
where condition
limit 1000
order by id
)
The nice part about this that you can use a similar query for SQL Server: Replace limit 1000
with top 1000
, and put it after the select
:
delete from table
where id in (
select top 1000 id
from table
where condition
order by id
)
Executing this query manually in SSMS gave the following results:
- (1000 rows affected)
- (1000 rows affected)
- (1000 rows affected)
- this goes on for a while...
- (0 rows affected)
- (1000 rows affected)
- (1000 rows affected)
There were no inserts happening at that time. So how did that happen?
Most examples of transaction phenomena use multiple statements per transaction. There is of course no reason to assume that transaction phenomena require multiple statements to show up:
The DMBS can choose to execute the select
and delete
parts of the query sequentially: First, select all rows, then delete all rows.
What actually happened: A background job was executing the exact same query. The background job and my SSMS session selected the same 1000 ids, but only the background job succeeded in deleting them. My session couldn't (after all, they're already deleted!).
In this particular case no fix was neccessary: The deleting was a pure cleanup job, and no harm comes from failing to delete all rows. The background job runs on a regular basis, and it would finish the deletion the next time it runs.
If we had to fix it, we would put the deletion in a transaction with the serializable
isolation level.