Today was day 2 of EMC World 2009. There were some great sessions today. I’m focused on two tracks this year, VMware and the CLARiiON product as we have just deployed both of these in our data center migration project.
This morning I caught the second half of the Advanced Best Practices for VMware Performance. I got there late, as breakfast ran late.
Some of the tips that I got where that when ESX assigns a vCPU to a physical CPU the core’s have to be on the same physical chip. The example given was a dual chip quad core server. On that server a quad core guest only have two options for placement. With all four cores on one chip or another. But a dual core VM has 12 different placement options, while a single core guest only has 8 options. This makes proper vCPU sizing extremly important.
I also caught CLARiiON CX4 New Features overview which went over the upcoming features in the next release of the CX4 FLARE code (FLARE is the software which runs on the CLARiiON hardware.
Some of those new features include that with the new FLARE code they will also be releasing the 8 Gig fibre channel modules for the units. These are drop in units which you can simply drop into the unit and will work in conjunction with the existing 4 Gig fibre card in the unit. With FLARE 29 the CX4 line will also support 10 Gig iSCSI as well as the 10 Gig iSCSI cards. You can replace the included 1 Gig iSCSI card, or simply add in the 10 Gig iSCSI card into an empty slot. You can not replace the 4 Gig FC card with an 8 Gig FC card as the 4 Gig card as the back end loops on it to the storage as well.
Another important addition to this version of FLARE will be the ability for the CX4 to reboot one Storage Processor (SP) without turning off the write cache on the second SP. While this technically leaves the remaining SP in a state which in the event that it had a catastrophic failure you could loose the writes in the cache the odds of that happening while the other SP is down are minimal. There are still times when the unit will automatically disable the write cache, they are fewer. Some of the cases where the SP’s won’t disable the write cache any more are:
- Single power supply failure
- Fan failure
- Single SPS failure
- Single Vault Disk failure (the Vault is where the SPs store it’s OS and where the cache is written to in the event of a power failure)
- SP restart or SP failure
- When a None Disruptive Upgrade (NDU) to the FLARE code is being performed
- SP restare during maintenance
- SP replacement during maintenance
- Replacing an IO Module
Some other information from this session are that 400 Gig Enterprise Flash Drives (EFDs aka SSDs) will be available soon. Lab tests show a performance bump of upto 30x when using EFDs over FC drives. Real world numbers are typically in the 8x to 12x speed bump with current costs about 8x that of the fibre channel drives. EFDs are however require about 50% of the power to run than there fibre channel counterparts. For a single shelf of disks (15) this isn’t that much of a power savings, but swapping out a full CX4 or several CX4s this could lead to a significant power saving.
Using my companies hosting costs if we had three CX4s side by side fully loaded (they hold 12 drive shelves each) using FC drives we would need 6 30Ax220V power circuits to run them. If we were to switch those units to all EFDs we could remove two power circuits without risking maxing out the circuits (please test your actual load before trying this) saving you a couple of thousand dollars per month on power costs. With the increased IO performance of the 400 Gig drives over FC 400 Gig drives there shouldn’t be the need to not fill the drives any more.
FLARE 29 will also give other power saving as well including adaptive cooling features which will allow the array to spin down the fans so that its only cooling the unit to the level required not spinning the fans at 100% all the time. What will probably give existing units and drives the most saving will be the ability to spin down disks when not in use. EMC has been very smart about this as they will allow you to select which RAID groups you want to spin down so that you are only spinning down those RAID groups that can afford the performance hit that they will have to take while waiting for the drives to spin back up. Now all RAID groups will be able to spin down however. The VAULT can’t be spun down, I assume so that in the event of a power loss the system doesn’t have to keep itself up while it waits for the disks to spin up. And only some disk models can be spun down. EMC will be testing each model of disk to make sure that it can handle the constant spin down and spin to up make sure that spinning down the disks doesn’t shorten the life of the disks. Only those disks which pass these test will be allowed to spin down. RAID groups which have disks which don’t pass these tests won’t be configurable for spin down.
Another great new feature of FLARE 29 will be integration of Navisphere and ESX. From within Navisphere you’ll be able to see which VMs are hosted on each host, and hopefully which LUN they are hosted on. When configuring this option you’ll be able to select which hosts to connect to, or if it should query your vCenter (or Virutal Center) server. We didn’t get information on what version of ESX or vCenter (or Virtual Center) will be required (specifically if 3.5 of 4.0 will be required). I would assume that ESX 3.5 or higher (or Virtual Center 2.5 or higher) will be required.
EMC has worked closly with Microsoft and with FLARE 29 will be supporting shrinking of LUNs when using Windows 2008 (not sure if any Linux flavor’s support shrinking of devices yet, I know that VMware doesn’t yet). This will allow you to easily reclaim unused space from a server if the requirements change.
In the afternoon I hit up more sessions on VMware, specifically about storage. The key points from this session were about the path management that ESX will be including in vSphere 4.0. Another feature which is great for labs, but not so much for production will be the ability to bind a guest machine to a physical HBA which will present that HBA to the guest so that the guest can access the HBA directly. The example for this that was given was for testing backup drives from a VM. However doing this prevents anything else from accessing the HBA, as well as disabling vMotion, auto failover, etc.
There have also been some major improvements in Storage vMotion so that it requires less CPU and memory resources when moving the storage from one volume to another. In ESX 3.5 apparently Storage vMotion requires enough space on the CPUs and RAM to allocate a duplicate set of RAM for the guest. vSphere 4 removes this requirement as it uses the snapshot technology already in VMware to make this move now.
There were some other things but I’m to hungry to remember them at the moment.
I can’t wait for tomorrow when I’ve got sessions scheduled which will directly impact my work. I’ll be going to the sessions on configuring Exchange on EMC and VMware, and a session on using Analyzer for optimize performance as well as some other CLARiiON and VMware sessions.
I snapped a couple of pictures today (mostly of the ducks in the lake by a hotel). I’ve uploaded them to my Flickr account.