Stalking tempdb growth

Recently, our tempdb log went completely bonkers, growing from its normal 30 GB to fill its 68 GB drive (with all the subsequent freakings-out by SQL Server and required service bouncings which you would anticipate). While glaring at job durations and recently-released builds, I threw a couple of small jobs into SQL Server to monitor tempdb data and log usage so it could page me if it decided to grow the tempdb log excessively and so that I could see when growth started and what it might correlate to.

I made two small tables and a sproc to fill each and scheduled the sprocs to run every 10 minutes. I found two interesting things.

  1. Our tempdb data files could be much smaller. They’re currently 20 GB, and we have 8 of them, as we’re using the T1118 tempdb concurrency enhancement at the direction of Microsoft (it’s ostensibly for 2000 only, but we’re running 2005 and they have us using it), and they top out at 1.8 GB of data each.
  2. The tempdb checkpoint interval, which as far as I can tell I have no control over, as it has nothing to do with the server-wide recovery interval, appears to be 2-3 hours during the week and 6-7 hours on the weekend. When I graphed the log usage in Excel, the amount of data in the log very clearly repeatedly grew steadily to a point (13 GB during the week, 19 GB on the weekend), then plummeted to 0, then grew steadily back to that same point.

My tables for recording sizes:

CREATE TABLE dbo.tempdbDataSize (
 time SMALLDATETIME NOT NULL DEFAULT GETDATE(),
 filename CHAR(8) NOT NULL,
 usedMb SMALLINT
)

CREATE TABLE dbo.tempdbLogSize (
time SMALLDATETIME NOT NULL DEFAULT GETDATE(),
gb TINYINT,
usedGb TINYINT
)

If you haven’t changed your tempdb file names, you can use a CHAR(7) in tempdbDataSize. Since we have 8, I renamed the first one from “tempdev” to “tempdev1” for consistency and numbered them (tempdev1 – tempdev8). We have a busy system, so I’m recording my tempdb log sizes in gigabytes. You may need to use megabytes instead, in which case you’ll want to kick those TINYINTs up to SMALLINTs and remove the division by 1024 from the statement (below) which audits the log usage.

I found the code to get the file usage by turning on SQL Profiler and then opening the Shrink Files pane in SSMS (remember not to click OK in this pane if you try this yourself — shrinking tempdb while people are using it is Supremely Bad Idea).

The tempdb data size audit sproc:

CREATE PROCEDURE dbo.tempdbDataSize_audit AS

SET NOCOUNT ON

CREATE TABLE #tmpspc (
Fileid int,
FileGroup int,
TotalExtents int,
UsedExtents int,
Name sysname,
FileName nchar(520)
)

EXEC(‘USE tempdb INSERT INTO #tmpspc EXEC (”dbcc showfilestats”)’)
INSERT INTO tempdbDataSize (filename, usedMB)
SELECT name, UsedExtents * 64 / 1024
FROM #tmpspc
DROP TABLE #tmpspc

The tempdb log size audit sproc:

CREATE PROCEDURE dbo.tempdbLogSize_audit AS

SET NOCOUNT ON

DECLARE @sizeGb INT, @usedGb INT

CREATE TABLE #tmplogspc (
databaseName SYSNAME,
logSizeMB FLOAT,
SpaceUsedPercent FLOAT,
status BIT
)

INSERT INTO #tmplogspc
EXEC (‘dbcc sqlperf(logspace)’)
SELECT @sizeGb = logSizeMb / 1024, @usedGb = logSizeMB * spaceUsedPercent / 100 / 1024
FROM #tmplogspc
WHERE databaseName = ‘tempdb’
DROP TABLE #tmplogspc

DECLARE @subject VARCHAR(50)
IF @sizeGb >= 40
BEGIN
SET @subject = ‘Tempdb log size is ‘ + CAST(@sizeGb AS VARCHAR) + ‘ GB’
EXEC sendEmail @toEmails = ‘me@nullmydomain.com’, @subject = @subject
END
ELSE IF @sizeGB > (SELECT TOP 1 gb FROM tempdbLogSize ORDER BY time DESC)
BEGIN
SET @subject = ‘Tempdb grew to ‘ + CAST(@sizeGb AS VARCHAR) + ‘ GB’
EXEC sendEmail @toEmails = ‘me@nullmydomain.com’, @subject = @subject
END

INSERT INTO tempdbLogSize (gb, usedGb) VALUES (@sizeGb, @usedGb)

The sendEmail sproc is my custom wrapper for sp_send_dbmail — you can just drop in a sp_send_dbmail call.

Diagnosis

My findings haven’t been particularly useful in diagnosing the mysterious tempdb log growth. As a bandaid to prevent further system unavailability, I changed my sproc that rebuilds indexes overnight to do its sorts in the user database rather than in tempdb. This causes them to take a little longer, as the happy parallelism of having the index and the rebuild spread across multiple physical disks is lost, but the tempdb log has behaved since. One of the first things I checked for was recent large growth in table size (you record rowcounts daily so that you can see growth over time, right?), figuring this would correspond to large growth in index size and thus tempdb log usage during rebuilds, and there wasn’t any, so this isn’t particularly helpful.

Leave a Reply

Your email address will not be published. Required fields are marked *


*