This week I spoke at two great events. The first was SQL Zaterdag (SQL Saturday in Dutch) and the second was SQL Server Days. These were both great events that I am so happy to have been a part of. For those that attended SQL Zaterdag you’ll find my slides for those sessions on this other blog post. Below you’ll find the slides for the sessions I did at SQL Server Days.
Optimizing SQL Server Performance in a Virtual Environment
Table indexing for the .NET Developer
I hope that everyone enjoyed these two events as much as I did. I look forward to seeing all my crazy new European friends at the SQL PASS summit or the next time I make it out to Europe.
I learned a lot from your sessions at sqlserver days in Belgium. Very usefull and well explained!!
Now I’m waiting for the audio recordings to be available (sqlug promised this will be at the end of 2011) since I’m suprised about some topics in the powerpoint about sql perf in a virtualized environment.
Since I normally only have time to do the basic things (backups, index rebuilds, setup, …) our management allowed me to hire a performance expert. I read ballooning should be enabled while our expert tells ballooning should be avoided since it might force the guests to start using the pagefiles. I also read lock pages should be disabled while our expert tells lockpages in memory should be enabled in order to make sql server the least likely candidate to start paging.
I can only guess the difference in opinion is related to our issue, being unacceptable IO latencies. So, our expert tries to minimize IO, to improve page life expectancy and buffer cache hit ratio, just until we can migrate from sql2005 32-bit (4GB) to sql2008R2 (8GB or more if required).
I also read to keep OS, data, logs and tempdb on seperate disks. We do this consequently on every server, but I also hear impact is very low or zero, since the luns probably reside on the same disks.
I especially remember you saying you have to watch all levels of the environment. We can monitor the guest and we can see IO-latency is too high. Now we have to convince our sysadmins we need access to the counters from the host, network and storage in order to map the counters and to pinpoint the cause of the latencies on the guest. Actually that’s why I searched for your powerpoint and the audio recording, just to prove we need that info.
Regarding this your powerpoint states MONITOR, MONITOR, MONITOR the guest, hypervisor, host and storage. I wonder here in what way you make distinction between the host and the hypervisor. I saw it as the same thing, but perhaps you want to make distinction between the hypervisor OS and host hardware? Correct?
Anyway thanks a lot for your sessions,
When ballooning is enabled the host will request memory back from the guest. When ballooning is disabled if the host needs memory back it’ll get it back by paging the guest’s memory to disk. At this point the guest will think it still has the same amount of RAM but instead it’ll be paging to disk.
If lock pages in memory is enabled then SQL will ignore requests from the Windows OS on the guest to get memory back. If the Windows OS at the guest level is requesting memory back from SQL because the host needs memory back, and SQL doesn’t give back the memory the host may being paging the guest to disk causing performance problems that are very hard to identify.
If I was working with a single guest on a host and the host had more memory than the guest needed then turning on lock pages in memory and turning off the balloon driver would be fine. However if there are other VMs on the host and other things going on SQL needs to behave itself and give back memory as needed.
The difference between the host hardware and the hypervisor would be if you have a Dell DRAC or HP iLO card that can monitor the physical hardware bypassing the OS and software which is installed on the hardware. Where the hypervisor is just software that you ask what it thinks the hardware is going. While the numbers should match, if there’s a time that they don’t you know something strange is happening.
I think I understand, but shouldn’t we (as you indicate in the 3td paragraph.) setup production esx in a way there can’t be RAM overcommitment … so the chance on ballooning is nearly zero (unless VMotion is initiated due to a hardware failure on another ESX) … and then assure with lock pages that other processes (rdp, ssms, …) use pagefile if required instead of sql pages are written to virtual memory?
This way, I guess, we abuse a bit the advantages of resource sharing brought by virtualized environments … but we ensure that production environment is as performant as possible.
PS: I’m migrating from dev to dba. So I have some knowledge to catch up in order to understand the arguments of my sysadmin collegues 😉
Thx + Kind regards
In a perfect world I would say yes the VMs running SQL Server should be setup with no memory over commit. However most shops I’ve looked at even if they were setup to run this way at the beginning they don’t stay that way forever.
Typically we don’t want anything using virtual memory. We configure SQL Server to leave enough physical memory available for other processes so that nothing is paging out. If anything starts paging things will usually begin to slow down.