SQL Server Agent is probably one of the least loved, but most used components in Microsoft SQL Server today.
As applications go it hasn’t really gotten any sort of upgrade in probably 20 years, at least I can’t really think of any changes since SQL 2005 was released, maybe even SQL 2000. And that does show just how solid it really is.
The SQL Server Agent just sits there and does its job without any real love or attention. That really is a testament to the work that was done on the SQL Server Agent back in the day that nothing has really needed to be done to it in the last 15-20 years. So for this I say, “Well done Product Team, well done”.
It can be used for job scheduling, alert notification when resources utilization is high, etc.
I’ve written a couple of recent posts that were extremely critical of PASS and more so C&C which is the company that manages PASS. Someone who read my last post pointed out that I probably didn’t emphasize the budget numbers I talked about enough. So let’s talk about that. I grabbed the most recent PASS financial data which was published in March 2020.
Cash on Hand (effective)
Summit Revenue Projections
This is a challenge, because obviously I don’t have the actual costs for PASS is spending per attendee for virtual summit. Many years ago, my rough understanding of Summit cost per attendee was that it was $400. So, for the purposes of my math, I’m going to estimate that virtual summit will cost $100/attendee (I suspect the actual cost is closer to $250 given that is what chapter leaders are being charged. Per the June, meeting minutes C&C has agreed to reduce their expenses by $500,000. It’s not clear where that comes in, but let’s just say that drops non-Summit expenses to $2.7MM.
If we have 2000 attendees of virtual PASS Summit in 2020 which I think may be generous estimate, all paying for the whole conference.
Cash on Hand (effective)
If we have 1000 attendees doing the All in One Bundle and 500 attendees doing the 3 day conference.
Cash on Hand (effective)
Given my experience and the current economy, I think my above projections are fairly optimistic. Let’s say my cost projections of $100 per person are too low, and the costs are $250 per person. Also, let’s say only 500 people sign up for the full conference and 500 register for the three day conference.
Cash on Hand (effective)
PASS doesn’t officially release attendance numbers, they say that 4000 people attended PASS Summit last year, which sounds really great. However, conference math is a factor here—many conferences count precons separate from the individual conference attendance. If you attended two precons, and the conference you would count as three conference attendees. In a best-case scenario where you had 4000 attendees, top line revenue would still drop by $3.5 million (or 54%) and fixed operating expenses are only down $500k (or 16%). That is as they say in business school is an untenable situation.
This is just focusing on the short term—2021 will face similar challenges. It is very possible that by November 2021, in-person conferences will be back (this assumes a vaccine in place, but Goldman Sachs does, and I trust them when it comes to money). However, I don’t see attendance quickly returning to pre-pandemic levels until 2022 or 2023, which means PASS will likely continue dipping into its cash on hand until reaching bankruptcy.
Sure, PASS Pro is a second potential revenue source, but it faces many challenges in getting of the ground and adding enough revenue to have any substantial impact. In addition to the fact that it has many community speakers feel alienated by the conversion of their Summit sessions or networking events into paid of profit sessions.
One final note, in FY2020 PASS spent approximately five percent ($385K) of its revenue on community activities. That number was substantially beefed up by a Microsoft SQL Server 2019 upgrade effort and to the total community spend has been dropping over time. For a point of reference C&C charged pass $525k for IT services in 2019. It’s important to remember that PASS exists to serve the broader SQL community and not a for-profit firm.
I’m writing this post because I’ve been mired in configuring a bunch of distributed availability groups for a client, and while the feature is technically solid, the lack of tooling can make it a challenge to implement. Specifically, I’m implementing these distributed AGs (please don’t use the term DAG as you’ll piss off Allan Hirt, but more importantly its used in Microsoft Exchange High Availability, so it’s taken) in Azure which adds a couple of additional changes because of the need for load balancers. You should note this feature is Enterprise Edition only, and is only available starting with SQL Server 2016.
First off why would you implement a distributed availability group? If you want to implement a disaster recovery (DR) strategy in addition to a high availability strategy with your AG. There’s limited benefit of implementing this architecture if you don’t have at least four nodes in your design. But consider the following design:
In this scenario, there are two data centers with four nodes. All of the servers are in a single Windows Server Failover Cluster. There are three streams from the transaction log on the primary which is called SQL1. This means we are consuming double the network bandwidth to send data to our secondary site in New York. With the distributed availability group, each location gets its own Windows Cluster and availability group, and we only send one transaction log stream across the WAN.
This benefits a few scenarios–the most obvious being, it’s a really easy way to do a SQL Server upgrade or migration. While Windows clustering now supports rolling OS upgrades, its much easier to do a distributed AG, because the clusters are independent of each other and have no impact on each other. The second is that its very easy to fail back and forth between these distributed availability groups. You have also reduced by half the amount of WAN bandwidth you need for your configuration, which can represent a major cost savings in a cloud world or even on-premises.
If you think this is cool, you with smart people–this is the technology Microsoft has implemented for geo-replication in Azure SQL Database. The architecture is really robust, and if you think about the tens of thousands of databases in Azure, you can imagine all of the bandwidth saved.
That’s Cool How Do I Start?
I really should have put this tl;dr at the start of this post. You’ll need this page at docs.microsoft.com. There’s no GUI. Which kind of sucks, because you can make typos in your T-SQL and the commands can still potentially validate and give you non-helpful error messages (ask me how I know). But in a short list here is what you do:
Create your first WSFC on your first two nodes
Create an Availability Group on your first WSFC, and create a listener. Add your database(s) to this AG
If you are in Azure, ensure your ILB has port 5022 (or whatever port you use for your AG endpoint) open
Create your second WSFC on the remaining two nodes
Create the second AG and listener, without a database. In case you really want to use the AG wizard, add a database to your AG, and then remove it. (Or quit being lazy and use T-SQL to create your AG)
Create the distributed AG on the first AG/WSFC
Add the second AG to your distributed Availability Group
This seems pretty trivial and when all of your network connections work (you need to be able to hit 1433 and 5022 from the listener’s IP address across both clusters). However, SQL Server has extremely limited documentation and management around this feature. The one troubleshooting hint I will provide is to always check the error log of the primary node of the second AG (this is known as the global forwarder), which is where you will see any errors. The most common error I’ve seen is:
A connection timeout has occurred while attempting to establish a connection to availability replica ‘dist_ag_00’ with id [508AF404-ED2F-0A82-1B8A-EA23BA0EA27B]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance
Sadly, that error is a bit of a catch all. In doing this work, I had a typo in my listener name on the secondary and SQL Server still processed the command. (So there’s no validation that everything and connect when you create the distributed AG). I’m sure in Azure this is all done via API calls, which means humans aren’t involved, but since there is no real GUI support for distributed AGs, you have to type code. So type carefully.
Overall, I think distributed availability groups are a nice solution for high available database servers, but without more tooling there won’t be broader adoption, and in turn, there won’t be more investment from Microsoft in tooling. So it’s a bit of a catch 22. Hopefully this post helps you understand this feature, where it might be used, and how to troubleshoot it.
Occasionally, we must resize or realign our Tempdb log file (.ldf) or data files (.mdf or .ndf) due to a growth event that forces the file size out of whack. To resize we have three options, restart the SQL Server Service, add additional files, or shrink the current file. We most likely have all been faced with run away log files and in an emergency situation restarting the SQL Services may not be an option but we still need to get our log file size smaller before we run out of disk space for example. The process of shrinking down that file can get tricky so I created this flow chart to help you out if you ever get into this situation.
Now its very important to note that many of these commands will clear your cache and will greatly impact your server performance as it warms cache backup. In addition, you should not shrink your database data or log file unless absolutely necessary. But doing so, it can result in a corrupt tempdb.
Let’s walk through it and explain somethings as we go along.
First thing you must do is issue a Checkpoint. A checkpoint marks the log as a “good up to here” point of reference. It lets the SQL Server Database Engine know it can start applying changes contained in the log during recovery after this point if an unexpected shutdown or crash occurs. Anything prior to the check point is what I like to call “Hardened”. This means all the dirty pages in memory have been written to disk, specifically to the .mdf and .ndf files. So, it is important to make that mark in the log before you proceed. Now, we know tempdb is not recovered during a restart it is recreated, however this is still a requirement.
Next, we try to shrink the log by issuing a DBCC SHRINKFILE command. This is the step that frees the unallocated space from the database file if there is any unallocated space available. You will note the Shrink? decision block in the diagram after this step. It is possible that there is no unallocated space and you will need to move further along the path to free some up and try again.
DBCC SHRINKFILE (templog, 1000); --Shrinks it to 1GB
If the database shrinks, great congratulations, however for some of us we still might have work to do. Next up is to try and free up some of that allocated space by running DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE.
DBCC DROPCLEANBUFFERS – Clears the clean buffers from the buffer pool and columnstore object pool. This will flush cached indexes and data pages.
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
DBCC FREEPROCCACHE – Clears the procedure cache, you are probably familiar with as a performance tuning tool in development. It will clean out all your execution plans from cache which may free up some space in tempdb. As we know though, this will create a performance as your execution plans now have to make it back into cache on their next execution and will not get the benefit of plan reuse. Now it’s not really clear why this works, so I asked tempdb expert Pam Lahoud (B|T) for clarification as to why this has anything to do with tempdb. Both of us are diving into this to understand exactly why this works. I believe it to be related to Tempdb using cached objects and memory objects associated with stored procedures which can have latches and locks on them that need to be release by running this. Check back for further clarification, I’ll be updating this as I find out more.
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
Once these two commands have been run and you have attempted to free up some space you can now try the DBCC SHRINKFILE command again. For most this should make the shrink possible and you will be good to go. Unfortunately, a few more of us may have to take a couple more steps through to get to that point.
The last two things I do when I have no other choice to get my log file smaller is to run those last two commands in the process. These should do the trick and get the log to shrink.
DBCC FREESESSIONCACHE– This command will flush any distributed query connection cache, meaning queries that are between two or more servers.
DBCC FREESESSIONCACHE WITH NO_INFOMSGS;
DBCC FREESYSTEMCACHE – This command will release all unused remaining cache entries from all cache stores including temp table cache. This covers any temp table or table variables remaining in cache that need to be released.
DBCC FREESYSTEMCACHE ('ALL');
In my early days as a database administrator I would have loved to have this diagram. Having some quick steps during stressful situations such as tempdb’s log file filling up on me would have been a huge help. So hopefully someone will find this handy and will be able to use it to take away a little of their stress.
As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.
And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.