Violin Plots in Power BI

Published On: 2019-02-14By:

In case you aren’t familiar, I would like to introduce you to the violin plot.

A violin plot is a nifty chart that shows both distribution and density of data. It’s essentially a box plot with a density plot on each side. Box plots are a common way to show variation in data, but their limitation is that you can’t see frequency of values. In other words, you can see statistics such as min, max, median, mean, or quartiles, but you can’t see the individual values nor how often they occurred.

example box plot
Example box plot showing min, max, median, and quartiles

The violin plot overcomes this limitation (by adding the density plot) without taking up much more room on the canvas.

In Power BI, you can quickly make a violin plot using a custom visual. The Violin Plot custom visual (created by Daniel Marsh-Patrick) has many useful formatting options. First, you can choose to turn off the box plot and just show the density plot. Or you can choose to trade the box plot for a barcode plot.

box plot with bar code plot
Box plot with barcode plot in Power BI

Formatting the Violin Plot

There are several sections of formatting for this visual. I’ll call out a few important options here. First, the Violin Options allow you to change the following settings related to the density plot portion of the violin plot.

Formatting options for the density plot in the violin plot.

Inner padding controls the space between each violin. Stroke width changes the width of the outline of the density plot. The sampling resolution controls the detail in the outline of the density plot. Check out Wikipedia to learn more about the kernel density estimation options.

The Sorting section allows you to choose the display order of the plots. In the example above, the sort order is set to sort by category. You can then choose whether the items should be sorted ascending or descending.

Sorting options for the violin plot in Power BI

Next you can choose the color and transparency of your density plot. You have the ability to choose a different color for each plot (violin), but please don’t unless you have a good reason to do so.

Data colors controls the color of the density plot

The Combo Plot section controls the look of the bar code plot or box plot. Inner padding determines the width of the plot. Stroke width controls the width of the individual lines in the bar code plot, or the outline and whiskers in the box plot. You can change the box or bar color in this section. For the barcode plot, you can choose whether you would like to show the first and third quartiles and the median the color, line thickness, and line style of their markers.

Also make sure to check out the Tooltip section. It allows you to show various statistics in the tooltip without having to calculate them in DAX or show them elsewhere in the visual.

Violin Plot Custom Visual Issues & Limitations

This is a well designed custom visual, but there are a couple of small things I hope will be enhanced in the future.

  1. The mean and standard deviation in the tooltip are not rounded to a reasonable amount of digits after the decimal.
  2. The visual does not seem to respond to the Show Data keyboard command that places data in a screen reader friendly table.

As always, make sure to read the fine print about what each custom visual is allowed to do. Make sure you understand the permissions you are granting and that you and your organization are ok with them. For example, I used public weather data in my violin plot, so I had no concerns about sending the data over the internet. I would be more cautious if I were dealing with something more sensitive like patient data in a hospital.

Introducing the Violin Plot to Your Users

I think violin plots (especially the flavor with the bar code plot) are fairly easy to read once you have seen one, but many people may not be familiar with them. In my weather example above, I made an extra legend to help explain what the various colors of lines mean.

Another thing you might consider is adding an explainer on how to read the chart. I used a violin plot with a coworker who does not nerd out on data viz to show query costs from queries executed in SQL Server, and I added an image that explains how to read the chart.

Example explanation of how to read a violin plot

After all, we use data visualization to analyze and present data effectively. If our users don’t understand it, we aren’t doing our job well.

Have you used the violin plot in Power BI? Leave me a comment about what kind of data you used it with and how you liked the resulting visual.

Speaking Internationally at SQLBits

Published On: 2019-02-13By:

 

I am very excited and lucky to be speaking at my very first international conference, SQLBits. SQLBits, the largest SQL Server conference in Europe, held in Manchester England February 27- March 2nd. It is a conference for leading data professionals with over 70 sessions from speakers all over the world. I commonly speak at User Groups, SQL Saturdays and other Data Conferences here in the United States and have always wanted to broaden my reach to other countries. I am very much looking forward to seeing how or if conferences differ in Europe.  It is a huge honor to have been selected to share my performance tuning knowledge with attendees.

My session, Performance Tuning SQL Server on Crappy Hardware, will be Friday March 1st  @ 4:50pm  in Room 3.

Many of us must deal with hardware that doesn’t meet our standards or contributes to performance problems. This session will cover how to work around hardware issues when it isn’t in the budget for newer, faster, stronger, better hardware.  It’s time to make that existing hardware work for us. Learn tips and tricks on how to reduce IO, relieve memory pressure, and reduce blocking. Let’s see how compression, statistics, and indexes bring new life into your existing hardware.

Be sure to also catch my colleagues at Denny Cherry and Associates Consulting , Joey D’Antoni (B|T) and John Morehouse (B|T), who are also speaking at SQLBits.

Joey D’Antoni –

Azure Managed Instances—Your Bridge to the Cloud – Friday March 1st @ 2:25pm in Room 9

Many organizations would like to take advantage of the benefits of using a platform as a service database like Azure SQL Database. Automated backups, patching, and costs are just some of the benefits. However, Azure SQL Database is not a 100% feature compatible with SQL Server—features like SQL Agent, CLR and Filestream are not supported. Migration to Azure SQL Database is also a challenge, as backup and restore and log shipping are not supported methods. Microsoft recently introduced Managed Instances—a new option that provides a bridge between on-premises or Azure VM implementations of SQL Server and Azure SQL Database.

 Managed Instances provide full SQL Server surface compatibility and support database sizes up to 35 TB. In this session, you will learn about migrating your databases to Managed Instances, developing applications for managed instances. You will also learn about the underlying high availability and disaster recovery options for the solution.

John Morehouse-

SQL Server Databaseology: Deep Dive into Database Internals Saturday March 2nd, 2019 @ 4:10PM in Room 8

Have you ever taken apart a toaster or an alarm clock just to see how it worked? Ever wondered how that database actually functions at the record level, behind the scenes? SQL Server Databaseology is the study of SQL Server databases and their structures down to the very core of the records themselves. In this session, we will explore some of the deep inner workings of a SQL Server database at the record and page level.  You will walk away with a better understanding of how SQL Server stores data and that knowledge that will allow you to build better and faster databases.

There is still time to register at www.sqlbits.com .

Really, Really Fast Cloud Storage

Published On: 2019-02-12By:

For a long time, CPUs and memory got faster, but we were stuck with spinning disks that topped out at 15,000 RPM. We got SSDs which were orders of magnitude faster, but we were still ultimately limited by controller throughput. NVME changes all of that. Do you know what NVME storage is? If you don’t, you should read my column this month at Redmond Mag.

Did you know Azure now had VMs available with direct attached NVME storage? I waslooking at a client’s servers this week, and I was going to write a post about how storage latency in the cloud (specifically Azure with Premium Storage) is about 90-95% of most on-premises environments, based on what I was seeing on my customer system. Then I met the Lv2 series of VMs.

So just to give you a point of reference, this customer system is running on one of the largest VM types in Azure, the GS-5. We are running with 16 cores (of 32) for licensing reasons and we have 18 TB of premium storage presented to each VM in a single Storage Spaces pool, which gives us plenty of IOPs (~80,000). Remember though—premium storage is SSD, but it’s networked so our data needs to travel over cable to make it back to the machine. With that in mind I’ve been seeing single digit millisecond latency, as viewed from SQL Server’s sys.dm_io_virtual_file_stats DMV. I know there are other ways of better measuring IO performance using Performance Monitor or looking at the storage tier itself, but when my SQL numbers are that good, I generally don’t care.

I wrote my column with a focus on some of the newer persisted memory technologies—but some folks were kind enough to tell me that NVME drives were currently available in Azure. Microsoft is kind enough to allow MVPs to have a nice allowance in Azure—I spun up an L64s_V2 VM. Books online mentioned that the NVME was available, but when I logged into the VM, I didn’t see a volume mounted. I looked in in Storage Spaces and I found this magic.

Disks

You’ll need to use Storage Spaces within Windows to create a storage pool, and then create a virtual disk after that. I went ahead and used this post from Glenn Berry on how to use the DiskSpd tool from Microsoft. If you have ever used SQLIO to validate storage in the past, this is the more modern version of the tool. Anyway, onto the results.

thread        bytes      I/Os     MiB/s IOPs   AvgLat(ms) LatStdDev
     0       9654157312       1178486      306.90    39283.16     0.101      0.088
     1       6744514560        823305      214.40    27443.70     0.145      0.088
     2       9005244416       1099273      286.27    36642.71     0.108      0.134
     3       9004244992       1099151      286.24    36638.64     0.108      0.147
     4       9141108736       1115858      290.59    37195.54     0.107      0.088
     5       9164423168       1118704      291.33    37290.41     0.107      0.087
     6       9122758656       1113618      290.01    37120.88     0.107      0.086
     7       9144197120       1116235      290.69    37208.11     0.107      0.086
total: 70980648960       8664630     2256.43   288823.14     0.110      0.10

 

Two key things to look at in the above table—288,000 IOPs—holy crap batman. That’s a lot—I think I could get more by running more threads as well. Next is the latency—that’s .11 ms latency—that dot isn’t a typo. Really good traditional SANs have 1-3ms latencies, the crappy array your boss scored a deal on from Ron’s House of Unsupported Hardware is probably pushing 20-30 ms if you are lucky. This storage is 300x last latent than that off-brand array your boss got from Ron.

car-salesman-funnyjpg.jpg

Don’t buy your storage from Ron—go to Azure, AWS, or one of the many vendors offering this amazing technology.

Should SQL Servers be rebooted weekly

Published On: 2019-02-11By:

There are people out there that reboot SQL Servers (or all servers) weekly. Usually the argument that I hear for doing this is that SQL Server has memory leaks which can only be resolved by rebooting Windows.

My argument against this, is that no SQL Server doesn’t have memory leaks which cause it to need to be rebooted weekly.  SQL Server is designd to use all the RAM that’s allocated to it. By default SQL Server is configured to use 2147483647 MB of RAM, or all of the RAM in the server. You can and should be changing this setting to a lower number so that there is memory for Windows and anything else that’s installed on the box (anti-virus, SSIS packages, SSAS, etc.).  The people that I typically see doing these server reboots, have been doing them for 15-20 years, typically because back then rebooting a server would fix a problem.  It won’t anymore.

There are servers that are running SQL Server that have been up for months, or years without issue (I’m ignoring the fact that you aren’t patching SQL Server and Windows here). I’ve persoally seen boxes with pretty heavy workloads on them that have run for years without having to be rebooted.

SQL Server doesn’t have any memory leaks. It’s a world class product of Microsoft, that runs millions of databases (if not more) and makes Microsoft billions of dollars in revenue each year. Whatever memory leak you think there is in SQL Server, there’s probably isn’t. If you think there is, then contact CSS at Microsoft to report the issue as Microsoft needs to patch the problem.

The biggest problem that people will see if that the buffer pool is flushed when the SQL Server restarts, and this causes SQL Server to read all the data it needs from disk as there’s no data in RAM. This causes performance issues right after the restart of SQL Server.

If you’re restarting SQL Server because there’s some performance issues that “goes away” after a SQL Server restart, we’re going to be better off dealing with the root cause of what’s causing the problem to come up to begin with. It’s probably going to be indexing or statistics, but that’s just a guess as every server is different, but it’s probably a safe guess.  Looking at the server will tell us what the problem is for sure, so we can solve the problem (and hopefully make the server faster in the process).

Denny

The post Should SQL Servers be rebooted weekly appeared first on SQL Server with Mr. Denny.

1 2 3 4 407

Video

Globally Recognized Expertise

As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.

And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.