How to delete a large number of records from MSSQL table

How to delete records from the MSSQL table page by page?

In this short post, I am gonna share a quick SQL snippet allowing us to remove a lot of records from the MSQL table.

Contrary to appearances, deleting a large amount of data from a SQL table may not be a simple task.

It is good practice to avoid long-running transactions while deleting because the SQL Transaction Log file can grow rapidly.

To avoid problems like running out of disk space, it's good to use batches and remove the records page by page.

However, I am not gonna lie here - this script is not perfect but it's a good starting point.

The following approach has some advantages over the truncate function because it's possible to specify which records should be deleted.

Let's jump in.

The basic concept of the script:

  • Define a batch size (pageSize)
  • Calculate page count
  • Store the information in the variable table @tempLogIdWithPageIndexTable (record id and page number)
  • Iterate through pages in a while loop
  • Delete records for a specific page

When do I use this script?

I use the above SQL script from time to time during database maintenance.

It is good practice to monitor the data volume from time to time and delete unnecessary records.

We often deal with a situation where our table grows like crazy and we need to free up some space.  

Sounds familiar? 🙂

How to delete records from the MSSQL table using batches

For the purposes of the tutorial, let's create a simple table named "Logs".

Creating the table  

CREATE TABLE [Logs] (
    [LogId] INTEGER NOT NULL IDENTITY(1, 1), 
    [Text] VARCHAR(MAX) NULL,
    [Severity] VARCHAR(7) NULL,
    [Created] DATETIME,
    PRIMARY KEY ([LogId])
);

Populating table

Once the table is ready, we need to insert some data - let's add 10 dummy records.

INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 1','Error',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 2','Warning',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 3','Info',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 4','Warning',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 5','Warning',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 6','Error',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 7','Warning',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 8','Warning',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 9','Info',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 10','Info',GETDATE())

Declaring core variables

Now, we can move on and declare the core variables:

  • @tempLogIdWithPageIndexTable 
  • pageSize with the value of two (just for the purposes of this article).
SELECT count(1) FROM dbo.Logs;

DECLARE @tempLogIdWithPageIndexTable TABLE
(
	LogId INT,
	PageIndex INT
)

DECLARE @pageSize INTEGER = 2;
DECLARE @totalCount INTEGER

@tempLogIdWithPageIndexTable variable table is used to store the record id and page index.

This way, we'll be able to find records to delete in a while loop for a specific page.

​INSERT INTO @tempLogIdWithPageIndexTable (LogId, PageIndex) (
	SELECT  LogId, 
	CEILING(CAST(ROW_NUMBER() OVER(ORDER BY LogId) AS DECIMAL)/CAST(@pageSize AS DECIMAL)) AS PageIndex
	FROM [Logs] -- You can add WHERE conditions here
)

Calculating page count

With Logs table containing 10 records and pageSize = 2 - we are expecting five pages of data. 

SELECT @totalCount = count(1) FROM @tempLogIdWithPageIndexTable
​
DECLARE @pageCount INTEGER
SET @pageCount = CEILING(CAST(@totalCount as DECIMAL) / CAST(@pageSize as DECIMAL))
​
DECLARE @pageIndex INTEGER = 1

Printing the initial report with metrics 

Next, let's display some handful pieces of information to the output - to know what is happening while the program is running:

PRINT 'Total number of records to remove: ' + CAST(@totalCount as VARCHAR(MAX))
PRINT 'Page size : ' + CAST(@pageSize as VARCHAR(MAX))
PRINT 'Total pages : ' + CAST(@pageCount as VARCHAR(MAX))
PRINT 'Script started at : ' + CONVERT(varchar(25), getdate(), 120)

Deleting records in a loop page by page

Now we can go to the core of the script - the while loop: 

WHILE(@pageIndex <= @pageCount)
BEGIN	
	-- you can delete related entries first using JOIN here
	DELETE [dbo].Logs where LogId in 
	(SELECT LogId FROM @tempLogIdWithPageIndexTable WHERE PageIndex = @pageIndex)
​
	PRINT 'Processed page ' + CAST(@pageIndex as VARCHAR(MAX)) + '/' + CAST(@pageCount as VARCHAR(MAX))
	SET @pageIndex = @pageIndex + 1
END

Printing the end report 

Finally, we can display information about the program results - to see if records were deleted.

PRINT 'Script finished at : ' + CONVERT(varchar(25), getdate(), 120) 

SET @totalCount = (SELECT count(1) FROM dbo.Logs)

-- print the report when the job is done
PRINT 'Number of records after deletion : ' + CAST(@totalCount as VARCHAR(MAX))

Running script & output

Total number of records to remove: 10
Page size: 2
Total pages: 5
Script started at : 2020-09-05 15:36:21

(2 rows affected)
Processed page 1/5

(2 rows affected)
Processed page 2/5

(2 rows affected)
Processed page 3/5

(2 rows affected)
Processed page 4/5

(2 rows affected)
Processed page 5/5
Script finished at : 2020-09-05 15:36:21
Number of records after deletion : 0

Full source code

use [dbName]

/* Removing data from large table in MSSQL using batch */

CREATE TABLE [Logs] (
    [LogId] INTEGER NOT NULL IDENTITY(1, 1), 
    [Text] VARCHAR(MAX) NULL,
    [Severity] VARCHAR(7) NULL,
    [Created] DATETIME,
    PRIMARY KEY ([LogId])
);
GO

INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 1','Error',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 2','Warning',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 3','Info',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 4','Warning',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 5','Warning',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 6','Error',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 7','Warning',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 8','Warning',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 9','Info',GETDATE())
INSERT INTO [Logs]([Text],[Severity],[Created]) VALUES('Lorem ipsum 10','Info',GETDATE())

SELECT count(1) FROM dbo.Logs;

DECLARE @tempLogIdWithPageIndexTable TABLE
(
	LogId INT,
	PageIndex INT
)

DECLARE @pageSize INTEGER = 2;
DECLARE @totalCount INTEGER
​
INSERT INTO @tempLogIdWithPageIndexTable (LogId, PageIndex) (
	SELECT  LogId, 
	CEILING(CAST(ROW_NUMBER() OVER(ORDER BY LogId) AS DECIMAL)/CAST(@pageSize AS DECIMAL)) AS PageIndex
	FROM [Logs] -- You can add WHERE conditions here
)

-- Get total counts of records to remove

SELECT @totalCount = count(1) FROM @tempLogIdWithPageIndexTable
​
DECLARE @pageCount INTEGER
SET @pageCount = CEILING(CAST(@totalCount as DECIMAL) / CAST(@pageSize as DECIMAL))
​
DECLARE @pageIndex INTEGER = 1
​
-- print the report with metrics before start

PRINT 'Total number of records to remove : ' + CAST(@totalCount as VARCHAR(MAX))
PRINT 'Page size : ' + CAST(@pageSize as VARCHAR(MAX))
PRINT 'Total pages : ' + CAST(@pageCount as VARCHAR(MAX))
PRINT 'Script started at : ' + CONVERT(varchar(25), getdate(), 120) 

-- delete records in a loop by paging

WHILE(@pageIndex <= @pageCount)
BEGIN	
	-- you can delete related entries first using JOIN here
	DELETE [dbo].Logs where LogId in 
	(SELECT LogId FROM @tempLogIdWithPageIndexTable WHERE PageIndex = @pageIndex)
​
	PRINT 'Processed page ' + CAST(@pageIndex as VARCHAR(MAX)) + '/' + CAST(@pageCount as VARCHAR(MAX))
	SET @pageIndex = @pageIndex + 1
END

PRINT 'Script finished at : ' + CONVERT(varchar(25), getdate(), 120) 

SET @totalCount = (SELECT count(1) FROM dbo.Logs)

-- print the report when the job is done
PRINT 'Number of records after deletion : ' + CAST(@totalCount as VARCHAR(MAX))

SELECT count(1) FROM dbo.Logs;

Bonus: How to delete all records from the MSSQL table?

How to remove all data from a table efficiently?

If you want to delete all rows from the table - the simplest way to do so is to use the TRUNCATE command.

TRUNCATE TABLE  table_name;

Remember that the table cannot have foreign keys constraints!

How to hack this?

You can delete foreign keys temporarily and recreate them afterward.

In other words:

  • Drop the FK constraints
  • Execute TRUNCATE command 
  • Recreate the FK constraints 

Once you removed all rows - consider resetting the identity value as follows:

DBCC CHECKIDENT ('table_name', RESEED, 0)

Final words

The topic of deleting records has certainly not been exhausted but the above script is a good starting point.

I hope you find this short post useful and you can adapt the script to your needs. 

To download the source code for this post, please visit GitHub
Comments
Leave a Comment