Stop Following Every Best Practice Document That You Find

There are a lot of best practice documents out there. Some are new, some are older. Some of worth while, others are crap. Some of the older ones were good at the time, but now are meaningless.

Recently I was told that this best practices document is the gold standard that people should use.  Lets take a look at it for a minute and decide if that’s the case or not.  First other than the URL we’ve got no idea who actually wrote this (It is from someone named John Hicks) because the author is listed as “MSDNArchive”. archive In John’s tenure of writing blog posts on his msdn.com blog, we wrote 3.  Now this doesn’t by itself mean that the information that John is giving us is useless, but it’s a little odd that he’d start a blog in March 2008, post this article then not post again until April 2009, then not post again until June 2009 then never post again.

Lets go through the best practices section by section and see how they hold up to time.  The first one is “Use Different storage for data files and log files”.  99% of the time I agree, so let’s move on.

In “Configure Storage System to RAID 10” John talks about the fact that RAID 5 is slower to write to than RAID 10. Yes this is true.  We also need to remember that some systems will be just fine on RAID 5, in fact a lot of system will be just fine on RAID 5.  This is because you aren’t actually writing to the disk.  Instead you are writing to cache which then destages to the disk.  If SANyou are using a SAN, and you are writing directly to disk, something is wrong and you need to fix it.  So I’m going to call this bullet point crap.

Use DISKPART to partition new LUNs.  If you are using Windows 2003 to host your databases then yes you need to do this.  If you are using Windows 2008 or newer then this isn’t needed as your partitions will automatically be aligned correctly.

Format the partitions with 64k allocation unit.  Yes this is still valid.

Balance Disks between controllers.  Yes this is still valid, if you are on that kind of array.

So for the Storage Section of this checklist 3/5 are useful and the other two are junk.

The next section is the setup section.  This has 8 bullet points in it.  Lets see how many are useful.

The first is to “Size and split tempdb and disable auto-grow”.  So this says to size your database to the largest table plus 10%.  I’ve got a system that I manage which has a 2TB table.  It does not have a 2TB tempdb. TempDB should be sized at whatevesetupr size it needs to be for that system.  The best way to figure that out is to run the system for a few days and see what it grows to.  Create 1 tempdb file per CPU or core.  No, don’t do that.  Use 4 or 8 files and 99.999% of the time that’s all you need.  John also recommends that you disable auto-grow.  That’s insane, unless you like getting called in the middle of the night because the tempdb is full.

“Turn off features you don’t need” is John’s next recommendation.  This is an excellent idea and should be done on every production server, SQL Server or not.

The next recommendation is to run each server as a domain account.  These days I would recommend a managed service account for every machine.  This way you get the benefits of a domain account, without needing to change the passwords manually every time.  As this recommendation is out of date, we’ll call it junk.

The next recommendation is to get a CA signed cert for SQL for when you are using SQL Authentication.  I suppose that this probably isn’t a bad idea, except that the SQL Client doesn’t do any checking to see if the certificate that is used to encrypt the authentication handshake is valid or not. So even if you did this, it wouldn’t actually stop a man in the middle attack.  So this is junk (be great if the server actually worked like this).

Changing default ports from 1433 is just annoying. If you’ve changed the default ports I can find the new port in a few seconds with a port scanner.  This just makes life harder and provides no actual level of protection.  So this is junk.

“Limit data on Primary FileGroup”.  Yes, this is a good idea, do this.

“Keep first and last partitions empty”.  I see where he’s going with this, but so few systems use partitioning, there’s no point it in being a “standard”, but I’ll give him credit here.

“Isolate and static data”.  True, this would allow you to keep the static data in a read only filegroup, but really that’s only a few megs or Gigs of data.  Compared to how big the rest of the system is this probably has no real benefits to you, and is just going to make management of the system harder.  I’ll give him credit here, because while I’d never actually do this this recommendation isn’t really wrong and haven’t become wrong over the years.

In this section of the 8 bullet points, 4 are valid and the other 4 are either old and useless or were never valid to begin with.

The final section is the biggest, with 13 bullet points.  The first is to disable HyperThreading, which according to this post is available on “some” Intel Processors (Hint, it’s on all server class processors these days).  This is wrong, sort of.  For machines which have very high CPU usage havingpost_install HT enabled will hurt performance because of the extra task switching.  For machines which have low CPU usage having HT enabled will help because the tasks which are running are usually waiting on something other than CPU.  So this one is junk.

Ensure that HBAs are teamed, not failover.  This depends on your SAN vendor and your MPIO driver.  The default Microsoft MPIO driver is active/passive per LUN, so if you’ve got multiple LUNs you’ve got access to all the bandwidth.  Teamed isn’t the write terminology here, it’s simply active or passive.  This is controlled through the MPIO drive.  I want to call this junk, but the basic idea is write even if the terminology is crap.

Enable SAN Write Caching.  Yes do this.  How much of your cache should be for write cache, that’ll depend on your environment.  But I’ll let this one slide.

His NUMA recommendations are very out of date as he’s talking about 2 and 4 core boxes, so junk.

NIC teaming should be done for high availably of the NIC, not for bandwidth.  I’ve only seen a couple of bandwidth constrained SQL Servers and that was in the 10/100 network days.  I’m calling this junk even though the idea is right, his reasons suck.

Setting the HBA queue depth.  If you are having SAN problems then start playing with these settings.  If you aren’t then don’t mess with these.  So Junk.

Tuning on Checksums, yes do this.

Turning on RCSI for the databases.  I don’t recommend doing this unless you need to.

Manage Volume Name permissions aka. Instant File Initialization, yes do this.  It makes your database auto-grows happen in a lot less time.

Turning on TF1224, if it’s a large database, then yes.  Otherwise no (I don’t have it enabled on any of the large databases that I manage).  It’s better to fix the queries which are taking large locks to begin with (if possible).

Set the NICs, yes do this.

Turning on Jumbo Frames between the users and the server.  No, don’t do this. Never do this.  There’s no reason to do this.  This is useful if you’ve got users running huge reports and downloading massive data sets from the SQL Server that aren’t getting transmitted fast enough.  Jumbo frames between the SQL Server and the iSCSI array makes sense.  Jumbo frames between the SQL Server and the users desktop doesn’t.  There’s a few reason not to do this.

  1. If your users are connected to the SQL Server via the WAN this won’t matter as the packets will be shredded back to 1500 bytes over the WAN anyway.
  2. This will require that all network traffic coming from the client machines is Jumbo, so every server, switch, router, etc. needs to be adjusted to account for this.
  3. Your border routers CPU will go way up as it now has to shred every packet that comes from the corporate network to the Internet.
  4. Your users Internet access will get slower as their packets have to be shred and resized.

In other words, this is a crap recommendation.

So of the 13 bullet points here, 5 are crap.  In total of the 26 recommendations, 11 are junk and shouldn’t be done.  Some of these are junk because they are old and aren’t valid any more for one reason or another.  Others (like the Jumbo Frames one) look great on paper, but in practice are useless, or even harmful to other parts of the environment.

When writing (or finding) a best practices document, even if it’s been pushed on MSDN, be sure to understand what it’s talking about and make sure that the recommendations make sense, and that they are for your version of the software.

Denny

Share

One Response

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trust DCAC with your data

Your data systems may be treading water today, but are they prepared for the next phase of your business growth?