I’ve written a couple of recent posts that were extremely critical of PASS and more so C&C which is the company that manages PASS. Someone who read my last post pointed out that I probably didn’t emphasize the budget numbers I talked about enough. So let’s talk about that. I grabbed the most recent PASS financial data which was published in March 2020.
Cash on Hand (effective)
Summit Revenue Projections
This is a challenge, because obviously I don’t have the actual costs for PASS is spending per attendee for virtual summit. Many years ago, my rough understanding of Summit cost per attendee was that it was $400. So, for the purposes of my math, I’m going to estimate that virtual summit will cost $100/attendee (I suspect the actual cost is closer to $250 given that is what chapter leaders are being charged. Per the June, meeting minutes C&C has agreed to reduce their expenses by $500,000. It’s not clear where that comes in, but let’s just say that drops non-Summit expenses to $2.7MM.
If we have 2000 attendees of virtual PASS Summit in 2020 which I think may be generous estimate, all paying for the whole conference.
Cash on Hand (effective)
If we have 1000 attendees doing the All in One Bundle and 500 attendees doing the 3 day conference.
Cash on Hand (effective)
Given my experience and the current economy, I think my above projections are fairly optimistic. Let’s say my cost projections of $100 per person are too low, and the costs are $250 per person. Also, let’s say only 500 people sign up for the full conference and 500 register for the three day conference.
Cash on Hand (effective)
PASS doesn’t officially release attendance numbers, they say that 4000 people attended PASS Summit last year, which sounds really great. However, conference math is a factor here—many conferences count precons separate from the individual conference attendance. If you attended two precons, and the conference you would count as three conference attendees. In a best-case scenario where you had 4000 attendees, top line revenue would still drop by $3.5 million (or 54%) and fixed operating expenses are only down $500k (or 16%). That is as they say in business school is an untenable situation.
This is just focusing on the short term—2021 will face similar challenges. It is very possible that by November 2021, in-person conferences will be back (this assumes a vaccine in place, but Goldman Sachs does, and I trust them when it comes to money). However, I don’t see attendance quickly returning to pre-pandemic levels until 2022 or 2023, which means PASS will likely continue dipping into its cash on hand until reaching bankruptcy.
Sure, PASS Pro is a second potential revenue source, but it faces many challenges in getting of the ground and adding enough revenue to have any substantial impact. In addition to the fact that it has many community speakers feel alienated by the conversion of their Summit sessions or networking events into paid of profit sessions.
One final note, in FY2020 PASS spent approximately five percent ($385K) of its revenue on community activities. That number was substantially beefed up by a Microsoft SQL Server 2019 upgrade effort and to the total community spend has been dropping over time. For a point of reference C&C charged pass $525k for IT services in 2019. It’s important to remember that PASS exists to serve the broader SQL community and not a for-profit firm.
I’m writing this post because I’ve been mired in configuring a bunch of distributed availability groups for a client, and while the feature is technically solid, the lack of tooling can make it a challenge to implement. Specifically, I’m implementing these distributed AGs (please don’t use the term DAG as you’ll piss off Allan Hirt, but more importantly its used in Microsoft Exchange High Availability, so it’s taken) in Azure which adds a couple of additional changes because of the need for load balancers. You should note this feature is Enterprise Edition only, and is only available starting with SQL Server 2016.
First off why would you implement a distributed availability group? If you want to implement a disaster recovery (DR) strategy in addition to a high availability strategy with your AG. There’s limited benefit of implementing this architecture if you don’t have at least four nodes in your design. But consider the following design:
In this scenario, there are two data centers with four nodes. All of the servers are in a single Windows Server Failover Cluster. There are three streams from the transaction log on the primary which is called SQL1. This means we are consuming double the network bandwidth to send data to our secondary site in New York. With the distributed availability group, each location gets its own Windows Cluster and availability group, and we only send one transaction log stream across the WAN.
This benefits a few scenarios–the most obvious being, it’s a really easy way to do a SQL Server upgrade or migration. While Windows clustering now supports rolling OS upgrades, its much easier to do a distributed AG, because the clusters are independent of each other and have no impact on each other. The second is that its very easy to fail back and forth between these distributed availability groups. You have also reduced by half the amount of WAN bandwidth you need for your configuration, which can represent a major cost savings in a cloud world or even on-premises.
If you think this is cool, you with smart people–this is the technology Microsoft has implemented for geo-replication in Azure SQL Database. The architecture is really robust, and if you think about the tens of thousands of databases in Azure, you can imagine all of the bandwidth saved.
That’s Cool How Do I Start?
I really should have put this tl;dr at the start of this post. You’ll need this page at docs.microsoft.com. There’s no GUI. Which kind of sucks, because you can make typos in your T-SQL and the commands can still potentially validate and give you non-helpful error messages (ask me how I know). But in a short list here is what you do:
Create your first WSFC on your first two nodes
Create an Availability Group on your first WSFC, and create a listener. Add your database(s) to this AG
If you are in Azure, ensure your ILB has port 5022 (or whatever port you use for your AG endpoint) open
Create your second WSFC on the remaining two nodes
Create the second AG and listener, without a database. In case you really want to use the AG wizard, add a database to your AG, and then remove it. (Or quit being lazy and use T-SQL to create your AG)
Create the distributed AG on the first AG/WSFC
Add the second AG to your distributed Availability Group
This seems pretty trivial and when all of your network connections work (you need to be able to hit 1433 and 5022 from the listener’s IP address across both clusters). However, SQL Server has extremely limited documentation and management around this feature. The one troubleshooting hint I will provide is to always check the error log of the primary node of the second AG (this is known as the global forwarder), which is where you will see any errors. The most common error I’ve seen is:
A connection timeout has occurred while attempting to establish a connection to availability replica ‘dist_ag_00’ with id [508AF404-ED2F-0A82-1B8A-EA23BA0EA27B]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance
Sadly, that error is a bit of a catch all. In doing this work, I had a typo in my listener name on the secondary and SQL Server still processed the command. (So there’s no validation that everything and connect when you create the distributed AG). I’m sure in Azure this is all done via API calls, which means humans aren’t involved, but since there is no real GUI support for distributed AGs, you have to type code. So type carefully.
Overall, I think distributed availability groups are a nice solution for high available database servers, but without more tooling there won’t be broader adoption, and in turn, there won’t be more investment from Microsoft in tooling. So it’s a bit of a catch 22. Hopefully this post helps you understand this feature, where it might be used, and how to troubleshoot it.
This is clickbait post title, sorry. You are here now. The correct answer is that you should purchase MySQL as a database service from your favorite cloud provider (Google, Amazon, and Azure all offer prebuilt database as a service offerings) because they have gone through the trouble of making their solution highly available. I’ll speak to Azure, because I’m familiar with the platform–Microsoft doesn’t employ MySQL High Availability per se, however both the storage and the VM are highly available. If there is a disk failure, the service is not impacted, and if there is a failure in the compute tier, a new VM is provisioned.
My second recommendation, if you really, really want to build your own thing is to build a Windows Server Failover Cluster, and used shared storage. Make the MySQL service a clustered resource, and assign a floating IP to the service that will fail with it. (Yes, I know you have to pay M$ for the Windows Server licenses).
Why shouldn’t you use an open source solution to make your MySQL database highly available? First let’s look at a picture of a common MySQL high availability architecture:
If we think about what we need a clustering solution to provide it comes down to a few things:
Providing a floating IP address to allow connection to the primary
Check the health of the database services and initiate a failover in the event one of them isn’t healthy
Executing a clean database failover and providing the ability to easily fail back
Ensuring the stability of the overall cluster, maintaining quorum, and avoiding split brain scenarios
If you are using a shared storage scenario, the clustering solution needs to manage the shared storage to coordinate failover with services.
If you are using SQL Server with Windows Server Failover Clustering, the cluster service takes care of all of the above, and more. When you look to do this on Linux for MySQL that there about 10 different sets of components you can use to make the service highly available. At the basis of all of these solutions is MySQL replication it’s pretty trivial transactional replication. MySQL’s replication service is fairly robust, and the GTID implementation is pretty solid.
The problem is that the rest of the components are all mix and match. You could use Haproxy to float the IP address, but there’s no way to do a smart health check of the database. It simply does a port connection test. Which means, if your primary goes away, and then comes back without some advanced configuration your floating IP is going to fail back to the original primary whether it’s actually the primary in your replication pair. This is but one example–you are going to end up with 3 or 4 different components to execute each of these functions, and congratulations you are in charge of a complex distributed system that you are responsible for administering for the rest of your life.
But Joey, Facebook/Google/Pick You Other Favorite online megacorp run MySQL and they support it with 5 9s. Ok, sure, I don’t disagree with this–and as databases, MySQL and PostgreSQL are generally ok. But look around at your engineering staff–wait do you have engineering staff? If you don’t have a few people who have both really good Linux SA skills and DBA skills, you are going to be pretty quickly in situation where support is a challenge.
Finally, consider if you need an HA solution. Are you running on a virtual machine? As long as your infrastructure is solid, that probably gets you to about 99.5% availability on a bad week. What you absolutely want to avoid is the Windows 2000 paradigm, which is where your high availability solution incurs more downtime than a standalone system.
My teammate Meagan (b|t) messaged me in Teams yesterday afternoon to say “Joey, the client created a new database using your automated process, and my ETL user (which is a AAD user) didn’t get created, can you fix it?” Well, after a quick perusal of emails I remembered that I had the asked the client to add the create user process to their initial population process which hadn’t occurred yet. The reason why I did this was that creating an Azure Active Directory user in an Azure SQL Database from Azure Automation was painful and maybe not even possible. However, I pinged Rob Sewell (b|t) about the best way to do that. This sounded not that bad to do, but I managed to hit land mines around every corner.
The First Problem
Azure Automation is mostly PowerShell only—there is a Python option, but I’ve never used it, and I’m not going to start now. The trick with PowerShell is that it’s great for things you have to do to Azure Resources, it’s far less good for things you have to do inside of databases (think creating a user). I typically use the invoke-sqlcmd cmdlet, however we have a chicken and egg problem—I can’t create an AAD user from a SQL connection (a connection made using a SQL login) and invoke-sqlcmd doesn’t support authenticating with AAD. The Azure Automation service allows you to import 3rd party soluitons from the PowerShell gallery, so you can use DBATools which I did here. Rob has an excellent blog post here that describes this process.
The code, which I happily stole from Rob’s blog allows me to connect as a service principal. To easily facilitate this I made my automation account part of my DBA group (the Azure AD Admin group for the Azure SQL Server), which you can assign without this ridiculous process. I threatened to add Meagan’s ETL user to that group, but she was going to out me on Twitter.
After running that code I could connect to Automation run as account to my Azure SQL DB, but my query was failing with the following error:
I’m logged as a service principal there—hence the weird GUID, you can see that I have basically every privilege in SQL Server, but I can’t create a user from an external provider. PowerShell (and automation) say that the user could not be resolved.
The Next Land Mine
So I DMed Rob, and asked him WTF? It turns out for this to work, you need to create a service principal for your Azure SQL Database. If you aren’t familiar with service principals they are analogous to service accounts in an on-premises world. Doing this was the easiest step in the process—I have a PoSH script to hit every server in my subscription, and it was trivial to add a service principal as well as add to my database runbook. However, that was just the first part.
You have to give the service principal the “directory reader” permission in Azure AD, and the effective way to do this with Automation is to assign that privilege to a group. Well, it turns out adding AAD roles to group is a relatively new feature (it’s in preview) and more importantly requires P1 or P2 Azure Active Directory which has a per user cost. Which meant I needed to get approval. After much chatter on a DCAC teams channel I discovered since this feature was not user-assigned (e.g. it’s enabled for the entire AAD tenant once it’s enabled) I only had to have one AAD license in the tenant (I assigned it to Meagan). Once that was in place, I could grant the directory permission to the SQL Server Service Principals group.
Are We Done Yet?
I should have noticed in the documentation provided by the SQL team assigning groups with PowerShell, that there was a reference to the preview PowerShell module for Azure AD (I did, but I didn’t think it mattered because I was just assigning a user to a group). So I thought I had everything wired up when I started getting the following error:
Add-AzureADGroupMember: Error occurred while executing AddGroupMember
Message: Insufficient privileges to complete the operation.
DateTimeStamp: Tue, 25 Aug 2020 13:14:08 GMT
I have Global Admin and Subscription owner in the two environments I was testing in, so clearly this wasn’t a permissions issue. To further prove that point, I was able to add the service accounts I had created to the group through the Azure portal. So after writing like three emails with my further discoveries to the Azure MVP distribution list (I could add the service principal to a regular group, just not one with a role assigned to it). I went back and decided to play with that preview module.
Everything up to this point is me being an idiot, but I’m going to yell at Microsoft for a second. I couldn’t install the azureadpreview on my Mac because its dependent on Winforms—I thought Az modules were all supposed to be built on .NET core. I also couldn’t get it to run in cloud shell, which may be related to the Winforms thing, or not.
I do have a Windows VM, so I installed the module there, and it successfully worked on the DCAC tenant. I went to Azure Automation to install the module. If you’ve never imported a module into Azure Automation, you should know that the portal notification about a module import being complete is meaningless, because Azure Automation is a lying liar who lies.
Look on the modules page and hit refresh a lot. It usually takes 1-2 minutes for a module to import. I messaged Kerry in Teams.
And what do you know? It worked. I was concerned and about ready to murder someone, but it worked. Rob’s code is really helpful and he covers key vault in his post. I did have some open GUIDs in some of my code pictures, it’s cool those aren’t sensitive. However, you should store all your secrets in Key Vault as it’s fast and easy.
The other thing I learned in this process is that you can now make a guest user you Azure Active Directory Admin (this means I could make firstname.lastname@example.org or email@example.com an admin in the joeydantoni.com tenant), which you weren’t able to do before. Prior to this you could use a group and add a guest user to that group as I mentioned above. (Note: you should still use a group and not a single user as it’s best practice)
As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.
And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.