Recently I was working with a clients SQL Server cluster. The managed service provide had installed some Windows patches causing the SQL Cluster to fail over to the other node. No big deal, everything appeared to be working as normal.
After a couple of days we noticed something a little strange. There was a very strange wait type which was showing a LOT of wait type. This wait type was PREEMPTIVE_OS_GETPROCADDRESS which means that SQL Server is waiting on something outside of the database engine to respond. When I looked into the spid which was doing the waiting I saw that it was running the extended stored procedure xp_delete_file. What this file does, in case you aren’t aware is remove old SQL Server backups from the hard drive of the server based on parameters that you specified.
First thing that I did was look at the permissions of the files, they appeared to be setup correctly. the local admin group had full control, users had no rights, owner has full control. Knowing that the SQL Account should be a member of the administrators group on these servers (I didn’t set the machine up, so don’t get me started on minimum permissions). However when I looked in the admin group for this node of the cluster, the SQL Account wasn’t a member of the admin group. I jumped on to the other node and it was in that machines.
The reason that this was a problem is because of the way that NTFS handles permissions on new files when the user is an owner of the folder and has full control rights. Because the folder is owned by the local admin group, and the SQL Server was a member of the local admin group when the files were created they inherited the rights from the folder which were admins had full control, users had no rights, and owner had full control. Except that in this case ownership of the folder and the files was built inAdministrators which also carried down to the files. So when the SQL Account came through on the second machine looking to delete files it didn’t have the rights because it wasn’t in the built inAdministrators group any more.
Fortunately fixing this problem was pretty easy. I simply put the SQL Account in the local admin group on the misconfigured node and scheduled a short outage to restart SQL on that node so that it could pickup the new permissions. Then the long waits went away and the older backups were able to be deleted as they should be.
If you’d like to read more about why you don’t normally want to have the SQL Server running with admin rights and what the minimum needed rights means might I recommend you check out my security book Securing SQL Server (paperback | kindle | website) available on Amazon.com and other online retailers.