In January I had the chance to attend Storage Field Day 19 in Santa Clara, where we got to meet with a wide variety of startups and large enterprise storage companies. One of the more interesting companies we meet with was MinIO which has a really interesting and compelling object-based storage product.
I’ve talked about object storage here before, but it’s a very different paradigm than the traditional block based storage you may currently be using. With block storage files are split into evenly sized blocks of data (typically somewhere between 64 KB and 1 MB depending on your vendor). Data protection is provided by traditional RAID options.
Object storage on the other hand doesn’t split files into blocks. Files are stored as objects which contain the file data, metadata, and a unique identifier. There is no limit on the size or amount of the metadata associated with the file. If you have ever created a managed disk in Azure, taken a backup to URL, or used an Azure SQL Database you’ve used object based storage. In object based storage, redundancy is generally provided by maintaining three copies of the object (e.g. a write isn’t considered complete until it writes to all three copies).
Object storage is designed to solve problems of scale. One of the things I learned at Comcast was that the cost of SAN storage didn’t scale to some of the massive petabyte scale data problems we had. The management overhead, the cost, and sometimes even the storage itself does not scale. This is a problem largely for companies like Microsoft, Amazon, Google, Facebook, etc, who have massive amounts of data to store. But as data volumes grow there are lots of other firms who have very large volumes that they need to manage.
MinIO is a firm that offers such a solution. MinIO offers open source storage management software that offers extremely fast (183 GB/s reads and 171 GB/s writes). It is fully compatible with Amazon’s S3 API, which has somewhat become the de facto standard for object storage. They were working on Azure Blob Storage support when we visited.
One of the ways MinIO is able to get such good performance out of pretty standard hardware is by taking advantage of SIMD processor instructions, which all more text and number crunching to be performed per CPU instruction which dramatically increases performance. SQL Server uses this through the query processor’s use of batch mode.
MinIO’s storage can also be used as a persistent store for Kubernetes (drink), or used for systems like Spark, TensorFlow, and a replacement for Hadoop HDFS. Where you would probably use this in your environment would be to replace your file servers, or as a target for container storage, or maybe even an analytic store. Or you want to become a cloud storage provider and you need to host 50 PB of data in your data center.
As I mentioned in my post last week I recently had the opportunity to attend Storage Field Day 19, where I got to meet with a wide variety of storage software and hardware companies in Silicon Valley. One of the more interesting companies we met with was a longtime player in storage—Western Digital. (Disclosure—I own shares of Western Digital and was gifted an SSD after the event) One of the overwhelming themes of the week was the vast amounts of data that we are generating much of which is coming from video and IoT device telemetry. Western Digital estimates that 103 zetabytes (that’s 103MM petabytes, or 103 Billion terabytes) of just IoT data will be created by 2023.
We were able to hear from a wide array of executives at Western Digital making up various parts of their business. There are a few market forces that are driving the direction of the company. The first area is gaming—building internal NVME drives with up to 2 terabytes with bandwidth up to 3480 MB/second. Performance is one aspect of gaming systems, but design aesthetic and cooling are also very important. PC Gaming is a $37.5 billion market, so Western Digital sees this as a major market for them.
While the gaming part of the presentation focused on bleeding edge performance, the rest of the afternoon looked at increasing storage densities. While it went unsaid, I feel like much of the development in the hardware business is increasingly focused on public cloud providers like Microsoft and Amazon, as well as large scale data companies like Facebook and Twitter. Western Digital is at the forefront of this development through the develpment of zoned storage. One of the goals of this extension to the NVME standard is to allow ultra-fast SSDs to be zoned similar to the way hard drives can now. This is not technology that you will be implementing in your data center anytime soon, however it will likely be coming to a cloud provider in the near future.
The other aspect of storage futures are increased densities. While many analysts have prematurely speculated about the death of the spinning hard drive (in lieu of lighter, faster, cooler solid-state drives), the density offered by traditional hard drives is unmatched. Western Digital showcased volumes up to 20 TB, as well as multi-actuator driveswhich can increase the performance of a spinning disk by an order of magnitude. These drives will consume more power than a traditional drive, but less than the two traditional drives. The data on these platters is striped in a RAID-0 fashion on the drive itself.
The world is heavily dependent on reliable, fast storage for all of the data systems modern life demands. As one of the leading builders of storage media, Western Digital is well positioned to support both end users and hyperscale cloud providers now, and in the future.
Last week I got to spend some time meeting with numerous storage companies in Silicon Valley. I along with another dozen or so delegates met with companies large and small, including Western Digital, Dell EMC, NetApp, and startups like MinIO. I’ll be writing posts in coming weeks to talk about some of the interesting technology we learned about this week.
In this post I wanted to focus on some interesting scenarios. It’s something I specifically noticed when we were at Western Digital but came up again particularly with the startups we met with. I had this thought, and then on Sunday Argenis Fernandez (b|t) who recently returned to Pure Storage about about after this tweet.
Argenis was complaining about file systems because when you have very fast (think NVME, or faster) storage, or storage-class memory the overhead of all the things the file systems does become a significant portion of the time that it takes to complete an I/O operation. This isn’t significant when your IOs take 4-5 milliseconds to complete, but when they are completing in 50 microseconds you notice the time it takes for the filesystem to timestamp a file.
This leads me to the point I wanted to make in the post. Storage technology futures are very much bifurcated (that’s a fancy word for going in two directions) –on one end there is ultra-high performance NVME storage for workloads like gaming and ultra-high throughput trading systems. On the other end there is a lot of development around ultra-high density storage for hyperscale providers (that’s basically your public clouds and Facebook).
Did you know that there were hard drives with multiple actuators (needles to a record player for those of you who are old)?
The reason why this is happening is that spinning hard drives are here to stay, for density reasons (you may have heard that the world is going to have eleventy billion zetabytes by 2022 and most of it will be in cloud, or something to that effect), and SSDs still lack the density required to say be a cloud provider or host most of the world’s photos.
The cloud providers are also part of the high-speed storage game–mostly to be able to do things like NMVE over fabric, which will allow ultra-fast disk to by virtualized and shared.
What does this mean for you as a data professional and consumer of storage? It means things probably aren’t going to change that much for you. If you are working with an all-flash vendor for performance storage, you’ll see the gains as NVME rolls in, but a lot of the ultra-high speed storage will be limited by the rest of stack (OS and RDBMS). If you are in the public cloud, I think you will see storage get gradually faster and less latent over the next 18 months, and you will see densities increase in hard drives. Your SAN admin will get some better tools, that I’m going to talk about in some coming posts.
I’m doing another precon at SQL Saturday Chicago on March 20th, 2020. The event is at Benedictine University in Lisle, IL. This time we’re going to dive a little bit deeper into Azure. While last years precon focused on basic infrastructure skills in Azure, this time we’re going to focus a little deeper into the specifics of the Data Platform in Azure. I did a variant of this topic in India last August, but I’ve made a few changes based on a number of conversations with customers I had at Microsoft Ignite last year.
Watch our webcast which was broadcast on April 1st, or April 2nd (depending on which part of the world you are in) as Kevin Kline and Denny Cherry talk about the Top 5 Considerations When Moving Databases to Azure.
As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.
And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.