Mar 17

SqlQueryStress SQL Stress Testing Tool for Fun and Destruction

The March issue of SQL Server magazine has a blurb about a new SQL Server stress-testing tool called SqlQueryStress. It lets you plug in a query and (optionally) parameterize it from the results of another query. For example, a query which has a user ID as a parameter, and it will fill that parameter with values from a query that selects some subset of your users. Very cute.

Anyway, the fun part of this is that the network admin and I have been trying to devise ways to sent lots of traffic to a recalcitrant SQL Server which seems to be suffering from a TCP chimney bug with HP’s NICs for which the fix is to turn off some TCP chimney setting. It randomly falls offline itself, so we want to reproduce the error, turn off the setting, then verify that we can no longer break it. We’ve been trying combinations of copying several large files and running queries, but without success, so today I installed SqlQueryStress on fifteen servers, set it to spawn 200 threads that each ran 24,000 queries (this would cause it to run for about an hour), and let it rip. Within three minutes, we had to shut them all down because we’d flooded the network with so much traffic that our secondary office building down the street could no longer use the internet.

Most fun I’d had during a day in the office in a long time!

Mar 17

SqlQueryStress SQL Stress Testing Tool for Fun and Destruction

The March issue of SQL Server magazine has a blurb about a new SQL Server stress-testing tool called SqlQueryStress. It lets you plug in a query and (optionally) parameterize it from the results of another query. For example, a query which has a user ID as a parameter, and it will fill that parameter with values from a query that selects some subset of your users. Very cute.

Anyway, the fun part of this is that the network admin and I have been trying to devise ways to sent lots of traffic to a recalcitrant SQL Server which seems to be suffering from a TCP chimney bug with HP’s NICs for which the fix is to turn off some TCP chimney setting. It randomly falls offline itself, so we want to reproduce the error, turn off the setting, then verify that we can no longer break it. We’ve been trying combinations of copying several large files and running queries, but without success, so today I installed SqlQueryStress on fifteen servers, set it to spawn 200 threads that each ran 24,000 queries (this would cause it to run for about an hour), and let it rip. Within three minutes, we had to shut them all down because we’d flooded the network with so much traffic that our secondary office building down the street could no longer use the internet.

Most fun I’d had during a day in the office in a long time!

Mar 16

Enabling Service Broker Hangs


Summary:
Stop SQL Server Agent before you enable service broker.

One of our production database clusters has developed a nasty habit of losing network connectivity with no error messages or indicators of why, necessitating 1am drives to the NOC by a network admin to growl at it and scramblings to put the system it backs into emergency maintenance mode. The second time this happened, the operations team quite sensibly requested that I move all the databases to another server so that they could diagnose this one.

We had just transitioned our other production cluster from a set of servers attached to an MSA 1000 to a new set of servers attached to an EVA 8100, and the MSA and old servers were still hanging around. It was a matter of a few hours to coordinate offline time (thankfully, this database cluster housed non-24-hour-available systems), and agitate at developers until they changed connection strings and deployed code.

All went smoothly except that my job monitor — a job that runs every two hours and emails me about jobs which have failed or taken longer than usual in the meantime — failed. When it went to send email, it complained that service broker needed to be enabled for the database (no mention of which database, but I assumed msdb). I nipped over the BOL, found the appropriate command, and ran it. Twenty minutes later, it was still running. I killed it and started it again. Same hanging. I hopefully ran the command in the database that houses the job monitor sproc. It finished immediately, but it didn’t make mail work.

Eventually, it occurred to me that the SQL Server Agent, with its dirty little fingers constantly in msdb, could be causing a database configuration change to wait indefinitely for some sort of exclusive access. I stopped SQL Server Agent, ran the command to enable service broker, and it returned immediately. Yay!

Feb 22

SQL Agent Doesn’t Like Foreign Job History

Or, “Lessons Learned from Today’s Migration.”

We migrated our production database cluster from an MSA 1000 to an EVA 8100 with new servers this morning. (This is much easier than in-place upgrades where you take down the system, wipe the cluster nodes and rebuild them on the spot if you can swing it.) It went fairly smoothly, though when I went to compare the amount of time the EVA took to make full backups of all the databases (8 minutes) to the time it took the MSA (21 minutes), I found that even though I’d restored msdb from the previous server, there was no job history. Apparently SQL Agent doesn’t take kindly to foreign job history and zorched it as soon as the new server ran its first job. I restored a copy of msdb, shut down SQL Agent and sucked the contents of sysjobhistory back in from the copy, changing the server column to the new server’s name (and having to bump up the value of instance_id since the new server’s SQL Agent had already made a few records), and the transplant took.

Feb 22

The SSIS subsystem failed to load

Or, “Further Lessons Learned from Today’s Migration.”

When you change servers and restore msdb, you may find your maintenance plans failing and putting the following messages into various logs:

  • The SSIS subsystem failed to load
  • Subsystem could not be loaded
  • The job has been suspended
  • The specified module could not be found

The problem is most likely that the location of your SQL Server installation directory differs from that of the old server. Ours happened to be on a different disk drive. SQL Server keeps a table of pointers to the SSIS DLLs in msdb, and you need to flush this table and repopulate it with the new locations (KB article 914171). This is very simple.

To verify that this is in fact the problem,

SELECT * FROM msdb.dbo.syssubsystems

If the directory pointed to by the rows doesn’t exist, you’ve found the problem! Two quick queries will fix you right up:

  -- Delete the existing rows.
DELETE FROM msdb.dbo.syssubsystems
-- Fill the table with new rows pointing to the proper location of the DLLs.
EXEC msdb.dbo.sp_verify_subsystems 1

And finally, you need to restart SQL Server Agent for it to pick up the new table entries and unsuspend the SSIS jobs.

One More Thing

If you’ve changed servers and have SSIS packages (like maintenance plans), there’s another thing you most likely need to do. Each package/plan has at least one database connection, and they’re probably pointing to the old server. You need to modify each package/plan individually, changing the existing connection if you can (I couldn’t — all of the input boxes were disabled) or creating a new connection and changing all of the steps to use it. I’ve never been a user of DTS packages — too rickety and GUI-driven for me — but I can see why no one likes the direction Microsoft went with turning maintenance plans into DTS packages in SQL Server 2005.