If you are running monitoring scripts against your SQL Server database you need to know what those scripts do. As even a pretty basic monitoring script can cause some pretty big problems. Recently I was called to look at a large SQL Server who’s CPU had run up to almost 100% and users were getting huge numbers of timeouts.
Looking at the server using sp_whoisactive I could see massive waits in tempdb all on PAGELATCH wait types. Looking at sys.dm_exec_requests I could see that everything that was waiting was waiting on page 2:1:128. This was a pretty strange page ID for everything to be waiting on. Do I busted out DBCC PAGE and looked at the headers for the page. This page belongs to the system object sysobjvalues which stores meta-data about tables. The only reason that one of these system objects like this should be locked is if there’s an open transaction which created an object and never rolled back.
So I switched to the tempdb database and ran DBCC OPENTRAN. This showed a user transaction had been hanging around for about 90 minutes (it took a while for me to get the call and get VPNed in) and this session belonged to a user not the application. They checked with the user and he didn’t have any sessions running in SSMS.
So I ran DBCC INPUTBUFFER for the session with the open transaction and saw a query which was querying against various DMVs. So at some point this query window was used to begin a transaction, then create a temp table then do stuff (either with the temp table or without it). The transaction was never rolled back or committed so SQL hung onto it’s locks on those system objects, effectively preventing the application from creating temp tables or other temporary objects.
As soon as I killed this session, which forced it to roll back the application processes were able to begin flowing through the system again and everything got back to normal within about a minute or so.
So it just goes to show that you don’t have to be running some complex query against the SQL Server for all hell to break loose. One open transaction with a few small locks taken in just the right place can cause some serious problems pretty fast.