What You Need to Know About Data Classifications in Azure SQL Data Warehouse

Published On: 2019-05-30By:

Data classifications in Azure SQL DW entered public preview in March 2019. They allow you to label columns in your data warehouse with their information type and sensitivity level. There are built-in classifications, but you can also add custom classifications. This could be an important feature for auditing your storage and use of sensitive data as well as compliance with data regulations such as GDPR. You can export a report of all labeled columns, and you can see who is querying sensitive columns in your audit logs. The Azure Portal will even recommend classifications based upon your column names and data types. You can add the recommended classifications with a simple click of a button.

You can add data classifications in the Azure Portal or via T-SQL or PowerShell. Data classifications are database objects.

ADD SENSITIVITY CLASSIFICATION TO
    dbo.DimCustomer.Phone
    WITH (LABEL='Confidential', INFORMATION_TYPE='Contact Info')

To view existing data classifications, you can query the sys.sensitivity_classifications view or look in the Azure Portal.

SELECT
sys.all_objects.name as [TableName], 
sys.all_columns.name as [ColumnName],
[Label], 
[Information_Type], 
FROM sys.sensitivity_classifications
left join sys.all_objects on sys.sensitivity_classifications.major_id = sys.all_objects.object_id
left join sys.all_columns on sys.sensitivity_classifications.major_id = sys.all_columns.object_id
    and sys.sensitivity_classifications.minor_id = sys.all_columns.column_id

Be Careful When Loading With CTAS and Rename

One issue that is specific to using data classifications in Azure SQL DW is that it is possible to inadvertantly drop your classifications when you are loading your tables using the recommended T-SQL load pattern. Typically, when using T-SQL to load a dimensional model in Azure SQL DW, we perform the following steps:

  1. Create an upsert table via CTAS with the results of a union of new data from a staging table with existing data from the dimension table
  2. Rename the dimension table to something like Dimension_OLD
  3. Rename the upsert table to Dimension
  4. Drop the Dimension_OLD table
Animation of a table load process in Azure SQL DW


In the animation above, you’ll first see the load process as described, and then it will replay with sensitivity labels added to the dimension table. You’ll see that they are dropped when we drop the old dimension table. This makes sense because sensitivity classifications are objects related to that table. We would expect an index to be dropped when we drop the related table. This works the same way.

Check out my SQL notebook for a demonstration of the issue as well as my workaround that I describe below. If you spin up an Azure SQL Data Warehouse with the sample database, you can run this notebook from Azure Data Studio and see the results for yourself.

There are a few complicating factors:

  • There are currently no visual indicators of sensitivity classifications in SSMS or Azure Data Studio.
  • ETL developers may not have access to the data warehouse in the Azure Portal to see the sensitivity classifications there.
  • The entire process of adding and managing sensitivity classifications may be invisible to an ETL developer. A data modeler or business analyst might be the person adding and managing the sensitivity classifications. If the ETL developer isn’t aware classifications have been added, they won’t know to go and look for them in the sys.sensitivity_classifications view.
  • SSDT does not yet support sensitivity classifications. The only way I have found to add them into the database project is as a post-deployment script with the build property set to none.

The good news is that you can add the sensitivity classifications back to your dimension table using T-SQL. The bad news is still that the ETL developer must remember to do it. My workaround for now is a stored procedure that will do the the rename and drop of the tables plus copy the sensitivity classifications over. My hope is that it it’s easier to remember to use it since it will do the rename and drop for you as well.

Eventually, the tools will be updated to provide more visibility to data sensitivity classifications, but we still need to make sure they don’t get dropped.

For now, my recommendation is if you are going to go in and add a lot of sensitivity classifications, that you create a user defined restore point immediately after so that you know you have them in a backup somewhere. Azure SQL DW doesn’t do point-in-time restores the way Azure SQL DB does. It takes automatic restore points every 8 hours or so. So if someone went through the trouble of adding the sensitivity classifications and they were dropped through the data load process, there is no guarantee that you could use a backup to get them back.

Vote for My Enhancement Idea

If you would like Microsoft to add something to the product to keep sensitivity classifications from being dropped, or at least make it easier to add them back, please vote for my idea.

Not an Issue with Other Data Load Methods

Please note that if you are using other tools or methods to load your tables where you don’t swap them out, you won’t have the issue of dropping your sensitivity classifications. But I wanted to bring up this issue because I can see people spending a lot of time adding them and then suddenly losing them, and I want everyone to avoid that frustration.

Give Data Classifications a Try

I think data classifications are a good addition to SQL DW. Anything that helps us efficiently catalog and manage our sensitive data is good. I have added them in my demo environment and hope to use them in a client environment soon.

Have you tried out data classifications in SQL DW or DB? What do you think so far? If not, what is keeping you from using them?

Contact the Author | Contact DCAC

Cost Threshold for Paralellism

Published On: 2019-05-27By:

Cost Threshold for Parallelism is a setting in SQL Server that’s been around for as far back in the product that I can think of. It decides how expensive an operator needs to be because SQL Server will use multiple threads for that operator instead of a single thread. The Cost Threshold for Parallelism setting is set for a default of 5 (which is a horrible setting, but we’ll get into that later) which means that any operators which have a cost of below 5 will only use a single thread and any operators will have a cost of 5 or more will use multiple threads.

SQL Server has a default of 5 because it has to have something as the default. But that setting should be changed for most systems.  For OLTP systems, I typically recommend changing this setting to a value of 50.  The reason that I like to start with a setting of 50 for OLTP application is that any operators that have a cost of less than 50, typically aren’t going to see any improvement by using multiple threads. They will usually get slower because of the cost of using parallelism.  So we want to keep the quicker queries from using parallelism so that it’s only being used by the more expensive queries that actually need it.

Now 50 isn’t a perfect setting.  After making a change to the setting, the server needs to be monitored to make sure that to many queries aren’t going parallel.  The Cost Threshold for Parallelism might need to be adjusted again after the initial change, so keep in mind that this isn’t a one size fits all recommendation.

Another thing to keep in mind is that changing this setting will cause your plan cache to be expired and all the plans will need to be recompiled, so you should see CPU load on the server go up right after you change the setting.

Denny

The post Cost Threshold for Paralellism appeared first on SQL Server with Mr. Denny.

Contact the Author | Contact DCAC

Filtered Index Basics

Published On: 2019-05-22By:

In this post, we continue with another beginner’s blog of database features that may be unknown to many. Let’s take a look at filtered indexes. Many database administrators are fully aware of the power of indexes and know how to create them. However, I find that some have yet to dive into fully optimizing their indexes and taking advantage of what filtered indexes has to offer.

What is a filtered index?

Simply it’s an index with a where clause. It is an optimized non clustered index that can be narrowed down in scope to better fit a subset of data. Example being date ranges, years, non NULLs or specific product types.

Benefits

Using filtered indexes can improve query performance and plan quality over full table indexes. Statistics are more accurate since they only cover the rows of the filtered index resulting in better execution plans. It can reduce your index maintenance due to the decreased index size and you only maintain the data in the index that is changed; not an entire table of data. Lastly, since it is smaller in size, it will consume less storage. Why have an index full of rows that are never accessed?

Let’s see it in action

First create one using the GUI.

In your database under tables, migrate to indexes. Right click on indexes and choose New Index then Non-Clustered Index.

Like a normal index choose your columns and included columns where needed.

Then choose FILTER and type in your expression. Here we are telling the index to filter out all NULL values. Note I did not include the word WHERE here, when you script it out the WHERE is already included.

 

USE [AdventureWorks2014]
GO
CREATE NONCLUSTERED INDEX [FIIDX_ComponentID_StartDate] 
ON [Production].[BillOfMaterials]
(       [ComponentID] ASC,       [StartDate] ASC)
WHERE EndDate IS NOT NULL
GO

 

Let’s look at a couple execution plans now to see the results. This is a common query in environments where you want to look for items that have a specific end date. Instead of reading all the rows including the NULL (active products, represented by the EndDate column) you index is already scoped down to non-active products. If we had an index just on End Date the NULL would also be included and depending on how many products exist in your table those rows could be significant.

 

USE AdventureWorks2014; 
GO 
SELECT ProductAssemblyID, ComponentID, StartDate  
FROM Production.BillOfMaterials 
WHERE EndDate IS NOT NULL      
AND ComponentID = 5      
AND StartDate > '01/01/2008' ; 
GO

If you find for some reason the optimizer is not choosing to use your filtered index you can also force its use with a query hint. Caution using query hints are not always the best course of actions, this is simply an example of what you can do.

USE AdventureWorks2014; 
GO 
SELECT ComponentID, StartDate 
FROM Production.BillOfMaterials    
 WITH ( INDEX ( FIIDX_ComponentID_StartDate ) )  
WHERE EndDate IN ('20000825', '20000908', '20000918');  
GO

 Summary

As a DBA it’s important to consider all options when creating indexes. We tend to think in terms of predicates and included column, but don’t dive deeper into actual uses of the indexes and how can we better eliminate unneeded results within our index rather than with a query where clause. If you find that your index is only needed for the current’s years data then filter it at the index level. Don’t read through or store  years of data when you don’t have to. Start taking advantage of these.

Contact the Author | Contact DCAC

Making WordPress a globally distributed application on Azure WebApps

Published On: 2019-05-19By:

Before I start, I’m just going to say that if I can do this with WordPress, then it can be done with anything.

The goal that I was going for to have a WordPress website that was running on multiple contents (I only used two regions, but this can be scalled out as needed to other regions as needed) so that responce time on one site of the planet would be similar to users of the site that are on the same side of the planet as me.  I also wanted the failover for all of this if an Azure Region goes offline to be as quick and painless as possible.

Here’s what I had to get setup. I setup two Azure WebApps both of which were running Windows. One was setup in US Central while the other was setup in North Europe.

Website Failover

I tried doing this with Linux WebApps, but some of the underlying functions that I needed to make this all work.  Specifically I needed Site Extentions, which are only available on Windows WebApps today.

After setting up the two web apps, I uploaded the website content to the Azure WebApp.  Once the site was uploaded, and the custom DNS was working on the WebApp it was time to start making the replication of the web traffic work.

In a new web browser I opened the .scm.azurewebsites.net version of my site.  To get the SCM site it’s http://{Azure WebApp}.scm.azurewebsites.net.  Since my Azure WebApp is named www-dcac-com-central by SCM site is https://www-dcac-com-central.scm.azurewebsites.net/. (There’s going to have to go to this site a few times, so keep the URL handy.)

Be very careful of the order that you are doing the replication. If you setup the replication from a blank website to your website, then it’ll replicate the blank website. So before doing this, make sure that you have a proper backup on your WebApp BEFORE you configure the mirroring.

One you have the website open click on the Site Extenstions on the right of the menu at the top.

You’ll see a galary option.  Select the galary option and find the “Site Replicator” extention. Enable this extention by clicking the plus sign.  A window will popup to complete the install, click install.

Once the installation is complete, go back to the Azure portal. From the Azure Portal stop the WebApp and then Start it again (clicking restart will not work).

Again in the Azure Portal select the second site (in my case the second region is the North Europe region).  From here you need to download the publish profile for the WebApp.  To do this from the Overview option tab select the “Get publish profile” option from the top right.

Just download the file, we’re going to need it in a second.

Go back to the first sites SCM website (https://www-dcac-com-central.scm.azurewebsites.net/ in my case) and click the play button on the Site Replicator extension.

This is going to give you the configuration screen for the Site Replication (it may take a minute to open). The setting screen is pretty simple.  There’s a browse button on the bottom left of the screen, click that and navigate to the publish profile file that you downloaded earlier.

Give the site some time to replicate all the changes to the other region.  When it says “Succeeded” it should be just about done. The larger the website, the longer this will take.  I FTPed into both WebApps and watched the files appear until they were all there.  On a standard WordPress install, this took about 10 minutes.

Once this was all finished, I repeated the process in the other direction.  I downloaded the publish file from the US Central version and configured the Site Replicator on the North Europe region; then I uploaded the publish file to the North Europe region.  I then gave the process about 10-15 minutes to settle down and do any replication that needed to be completed.

Once this was all finished, I was able to upload files, pictures, WordPress updates, etc. from either site and the change would be replicated to the other region within a second or two.

Once the website replication was handled it was site to setup Traffic Manager. This would allow people to connect to their local version of our website depending on where in the world they are connecting from. Getting the endpoints setup was pretty simple. I used basic geographic load balancing and pointed North/Central/South America to the US Central version, and Asia/Europe/Middle East/Africa/Anti-Artica to the North Europe version.

The only hard part was that because WordPress is going to have a ton of redirects you can’t do normal HTTP monitoring. Normally you could have traffic manager pointing to “/” for the path to monitor, but WordPress didn’t like this. I changed the website to use “/license.txt” instead as the path as this would cause the traffic manager can come online correctly. It isn’t a perfect situation, but it works well enough.

Once everything is setup and traffic manager is happy and working, we can point public DNS to the site.  In our DNS configuration for www.dcac.com we added a CNAME record to DNS. A CNAME record in DNS redirects the request to another record.  In our case we pointed www.dcac.com to www-dcac-com.trafficmanager.net. This allows the Traffic Manager service to resolve www.dcac.com to the correct version of the website.

We can test that this is working as expected by looking at the output of the nslookup command.

By running nslookup on my laptop (which is currently sitting at my house in San Diego, CA), we can see that I’m resolving www.dcac.com to www-dcac-com-central.azurewebsites.net.

If we do the same nslookup command from an Azure VM that’s running in Singapore, from the VM in Singapore when I do a nslookup on www.dcac.com I get back www-dcac-com-northeurope.azurewebsites.net.

From these outputs, I can assume that I’m viewing the version of the website that’s closer to the user.

I’ve now got two duplicate copies of the website running in two different Azure Regions.

Database Failover

On the database side of things, need need to setup some replicate for that as well. “Azure Database for MySQL servers” now supports multi-region replicas but there’s no auto-failover available yet (hopefully it’ll be coming at some point soon).  For the database copy I did basically the same thing as I did for the websites (and that I’ve done tons of times for Azure SQL DB).

For the database side of things I setup WordPress to use the dcac-mysqlcentral copy. From there I clicked the Add Replica button, and that made a copy of the data to a new server called dcac-mysqlnortheurope that I setup in North Europe.

Since there’s no automatic failover at the database level today, if I need to do a failover I need to edit the wp-config.php file on the webserver, and that’ll kick the connection over to the other server.  I also need to setup the needed PowerShell to do the failover. My next step of this process is going to be to setup some Azure Automation to handle all the database level failover, but so far this is a pretty good step as there’s not website level failover.

The End Result

The end result of all of this is that our website is setup and running in two different places for better availabililty and better performance of our application.

Denny

The post Making WordPress a globally distributed application on Azure WebApps appeared first on SQL Server with Mr. Denny.

Contact the Author | Contact DCAC
1 2 3 4 5 421

Video

Globally Recognized Expertise

As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.

And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.

Awards & Certifications

Microsoft Partner       Insights Sccess Award    Technology Headlines Award    Golden Bridge Gold Award    CIO Review Top 20 Azure Solutions Providers    VMWare Partner
Microsoft Certified Master    Microsoft MVP
Share via