T-SQL Tuesday #128: Let’s Talk About Your Incident Reports

TSQL2SDAY Logo

Hello T-SQL Tuesday Readers! I’m sorry for being really late in getting this post out this week.

So! A couple of weeks ago, for this month’s topic, I asked everyone to post about something that broke or went wrong, and what it took to fix it. Last week, fourteen of you responded with your stories of woe so we could all learn from your incidents and recoveries in a constructive way, like pilots do. Here’s the recap of those posts, in the order that they came in.

What Everyone Had to Say

First, is Rob Farley with, “That time the warehouse figures didn’t match“… heh, I feel like I’ve heard this one before. But it turns out, no! this is new and awesome. Rob talks about a fundamental rule he has when loading data into a data warehouse: “protect the base table.” This is his first step in ensuring that data in the DW is correct, and as anyone who does DW or BI work knows–that is always the most important thing, because if trust in the data coming out of the reporting system is lost, it can be pretty hard to get it back.

John McCormack has “Optimising a slow stored procedure” next. John walks through his process of tuning up a stored procedure he had gotten an after-hours call about being slow enough that things were breaking. He’s got a good tip in here if you use SentryOne’s Plan Explorer, too. AND, there’s an added bonus of including something that I find frustrating when it happens. Basically: “This web page is usually really slow and the users were frustrated about it, but nobody ever told me!” Y’all! Tell us (IT, support, whoever) when you’re not happy, we’re usually happy to fix things to make your life easier!

Richard Swinbank talks about that time he unintentionally set a trap for himself in “Default fault.” Changing the default database for your login in SQL Server when you’re the DBA is all fine until you decommission that database! Richard includes great steps for digging yourself out of this hole with sqlcmd if you’ve “locked yourself out” of the instance when using SSMS.

Eitan Blumin’s post about a mis-behaving set of Azure VMs has a good surprise twist in it. This post struck me as a little hilarious, both because of the 32 hours bit, but also because I/we had just been talking about fullscan statistics update in our internal corpchat this week. I like fullscan stats in general, when they can be pulled off and are helpful (think non-uniform data distribution), but when they take down your AG, that’s, uh, bad. Don’t do that. Eitan also includes a nice set of takeaways at the end of the post.

SQL Cyclist/Kevin3NF I think wins the prize this month for having the longest contribution. Turns out, he has an ongoing series of blog posts of real-life stories along these lines, which include seven posts about fun goings-on that fit into this topic.

Next is Jason Brimhall with “Disappearing Data Files“… Jason shares a pet peeve of mine, which is having to fix the same thing over and over again. When that happens, it’s good that one at least has/knows the fix, I guess…. Anyway, Jason goes into using Extended Events as an audit tool to detect changes to tempdb files to help track down the root cause for a “recurring fix”, and reiterates that sometimes those root causes are hard to track down, and the best place to put yourself in for those situations is to be ready the next time.

Deborah Melkin says, “I feel like it’s been a while since I’ve joined the party.” Ohh, I’m pretty sure it’s not as long as it has been for me… Debora doesn’t have a specific break/fix story, but talks about how experiences and dealing with problems over time makes it easier to address new problems as they come up. This is the core lesson (lesson? Process?) I’m wanting everyone to get this month, so I appreciate this perspective!

Hugo Kornelis has my favorite post of the month. If you’re the type who doesn’t read all of the T-SQL Tuesday posts and waits for the recap to find the one or two that look the most interesting, make sure this one is on your list. It’s short and to the point, and contains a fantastic lesson. Two, really, although one isn’t explicitly mentioned as a lesson. This one is the fact that if there’s something you do on a regular or reoccurring basis, you should have that process written down. Think of it like a checklist (flying reference ahoy!). But what Hugo makes clear is that even if you’re doing something different that you don’t have a process for, but is adjacent to that process, it’s still a good idea to reference that process. Just read Hugo’s post, it’s easier than listening to me 🙂

I felt bad just reading Aaron Bertrand’s post. Just, go read it. Promise. Take his advice.

Lisa Bohm brings us what I think is a heartwarming story about what transparency and honesty can bring to even professional relationships. Lisa tells us about how this worked out for her while working for an ISV when things went bad on a Friday night (it’s always a Friday night). In addition to the honesty part, she had another takeaway that we can all do better at sometimes: Shut up and start listening.

Next up is STEVE JONES, the one that got me into this damn mess in the first place. He at least takes ownership of that in the post, heh. Steve’s speaking my language here with “And too few companies share their learnings publicly.” …This is a crab of mine, as well, and I think a lot of IT shops would make fewer bone-headed mistakes if everyone was more willing to share what they’ve learned, NTSB-style. I understand why things are the way they are, and all, but it doesn’t mean I have to like it. Anyway, Steve has many encouraging words for learning from others in the first place, and how he has been fortunate enough to be able to work with people who have helped him build better systems. I always appreciate Steve’s writing, and this post is no exception.

Glenn Berry wrote about a place where I think we’ve all been–it boils down to not RTFM 🙂 But also, it involves building PCs, which maybe most of us used to do, but hardly any do anymore. I still do, but only once every, like, eight years, so… This was funny, because as Glenn walked through how he got here, I saw right where this was going; I mean, with four disks to hook up, I would have gone right for that nice quad of stand-offs, too! I mean, why split the cables up between two places when you could just do one! Yeah. Uh-huh. Yep. Glenn’s main takeaway is, basically read the documentation, which is always a good idea. As someone who really likes writing it (I’m aware of how broken I am), I also know how little this happens in practice.

Todd Kleinhans talks about Letting it Fail. This is true–sometimes things have to break to get the right peoples’ attention, or to show just how bad a situation could get. Todd has a story about just one of these situations. I don’t think any of us like things getting to that point, but sometimes there aren’t other options. Todd also has a good final takeway, about “doing nothing” being a valid option to a situation, and it really is. May not lead to a good outcome, but it is an option!

Tracy Boggiano starts out with a line that is a big “been there, done that” for me: “One fateful night while I was not on call, I got a call around 3:30 AM.” Ahh yes. You’re not on-call, but you wind up on the horn with Ops, anyway. Tracy’s story has some good head-shaking items in it, which is about how I expect a story that starts like this to end. Tracy has a good line towards the end: “Everything from the network, to the server hardware, to the database creates the system and working as a team is the only way to make sure things are configured to perform and not fail.” Ain’t that the truth…

And finally, my man Andy Yun talks about Presentation Disasters. Andy comes through here with probably the most aviation-related lesson of all: Paranoia Pays Off. Yessssss! Always have a Plan B–plus C and D if possible–so when things go pear-shaped, you already have a plan. Andy was presenting at GroupBy back in May, when his headset died. But he was ready! Good lesson for all and everything here, not just those of us slinging TSQL or flying airplanes.

And that’s it! This was the first time I’ve hosted T-SQL Tuesday, and I want to thank everyone who shared their stories with us this week!

Contact the Author | Contact DCAC

T-SQL Tuesday #128: Learn From Others

Pilots do something that a lot of non-pilots will find fairly weird if not outright horrifying: We read accident (“crash”) reports. Some of us spend a lot of time reading accident reports, actually. Officially “Accident Reports”, these are put out by the US National Transportation Safety Board (NTSB) after investigation into a crash or an “incident.” In addition to aviation-related reports, there are highway and railroad reports, and even hazardous materials incidents.

Reports come in two flavors, a “preliminary” report, and ultimately, a “final” report after the investigation has completed. The final reports includes such items as conclusions and the probable cause of the accident or incident. To make life easier, they also include a Recommendations section, which, well, includes recommendations for how to keep this type of accident from happening in the future. These tend to be regulatory in nature, as they are geared towards the FAA.

The search form for aviation reports is here–https://www.ntsb.gov/_layouts/ntsb.aviation/index.aspx–if you’re, uh, thinking you want to get into this sort of thing.

Why do pilots do this? The rationale is pretty simple: To learn from the mistakes of others. Or, to learn how a bad day was kept from becoming a worse day after something broke.

What Does This Have to Do With SQL Server?

Great question. Besides the fact that I think piloting airplanes and DBA-ing are the same job, just with different scenery,  I wish we had this kind of transparency in the IT world when things went wrong. When a corporation has a big security incident, we’re likely not to hear a lot of details publicly about what went wrong and what was done to mitigate similar attacks in the future. This kind of information could help everyone. This is one of the things that cloud providers do quite a bit better: When something breaks, we get good information on what happened, why, and what’s going to be done about it. Of course, this is done because public cloud providers basically have to–if things went down a lot and we never heard why, that provider probably wouldn’t have a lot of customers for very long.

This brings me to T-SQL Tuesday.

Tell me (us all, obviously) about something recently that broke or went wrong, and what it took to fix it. Of course, the intent here would be for this to be SQL Server-related, but it doesn’t have to be. We can all learn from something going wrong in infrastructure-land, or how there was a loophole in some business process that turned around and bit somebody’s arm. It doesn’t even have to be all that recent–maybe you’ve got a really good story about modem banks catching on fire and that’s how you found out the fire suppression system hadn’t been inspected in years. Just spitballin’ here. If you’ve got an incident whose resolution can help someone else avoid the same problem in the future or improve a policy as a preventative measure, let us hear about it.

The Rules

Here are the rules as set out for the T-SQL Tuesday blog party.

  1. Your post should be published on Tuesday, 14 July, 2020 between midnight and 11:59:59 UTC/GMT/ZULU
  2. Include the T-SQL Tuesday logo in your post
  3. Link back to this invitation (usually done through the logo)
    (this will get syndicated, so link back to the original on airbornegeek.com, please)
  4. Include a comment on the invitation post or a trackback link
  5. Enjoy the chance to be creative and share some knowledge.

Contact the Author | Contact DCAC

T-SQL Tuesday #91 – Start Talking

T-SQL Tuesday is a monthly blog gathering for the SQL Server/Data Professional community  It is the brainchild of Adam Machanic (B|T) and is not limited to just things around the SQL Server database engine. Each month a blogger hosts the event and anybody who wants to contribute can write a post about that month’s topic. You can find a list of all topics at http://tsqltuesday.com/.  

This month’s T-SQL Tuesday topic is about DevOps.  It is being hosted by Grant Fritchey (B|T).

Grant asks some specific questions in this month’s posting:

  1. How do we approach DevOps as developers, DBAs, report writers, analysts and database developers?
  2. How do we deal with data persistence, process, source control and all the rest of the tools and mechanisms, and most importantly, culture, that would enable us to get better, higher functioning teams put together?

We’ll discuss each one, but first a recap.

What exactly is DevOps?  Wikipedia tells us that devops is:  “DevOps (a clipped compound of “software DEVelopment” and “information technology OPerationS“) is a term used to refer to a set of practices that emphasize the collaboration and communication of both software developers and information technology (IT) professionals while automating the process of software delivery and infrastructure changes.

Great.  Now we know what it is.  It is intended to be a set of practices that stress on the collaboration and communication of basically everybody (and I mean everybody) that might be involved in application development and delivery.   Ironically, this set of practices has morphed into an actual job title over the years.  My employer just recently hired a “DevOps Engineer”.

So, let’s go back to Grant’s questions.

How do we approach DevOps as developers, DBAs, report writers, analysts and database developers?

Honestly, I think it’s time for us data professionals to get out of the data center and get on board with the DevOps movement.  Let us take off the cloak of invisibility and get our hands dirty.  If we look at the overall “DevOps” role, we should want to have DevOps in our world.  While it may be a hard journey, in the long run it will make our jobs easier.  How would you like to have push button deployments or less manual code reviews?  I know that I do.

How do you do this?  Easy.  Talk.  Communicate.  Collaborate.  Break it down now.  Go talk to your respective teams (app dev, infrastructure, DBA’s, management, etc) and get them talking.  Everybody will have their own opinion and that is okay.  The important part is to get everybody talking.

If you need a starting point, ask this simple question: “What if we could implement a process to deliver application changes to the wild, that could potentially have little to no impact to our customers?”.  Think about that.  Do you work in an environment where deployments cause outages for your customers? I bet for the majority of us, that is true.

If your customers have a better end user experience, meaning little to no down time, isn’t that the name of the game?

DevOps helps to answer that question.

How do we deal with data persistence, process, source control and all the rest of the tools and mechanisms, and most importantly, culture, that would enable us to get better, higher functioning teams put together?

In my opinion, the answer is simple.  Trust.  As data professionals, we often question the process because once you involve data (the data is the business) then we get very distrustful of changes.  This is probably rightfully so.  How many times as a deployment gone sideways for you?  Ever have to roll back a deployment because it barfed?  I know that I have been there and usually it’s not that much fun.  Hopefully every DBA has a recovery strategy in place to handle such events.

In order to reach a true “DevOps” method to deliver application changes into the wild, we have to get our hands off of it. And I mean OFF.  This means tools must be in place in order to facilitate this. This also means that we have to TRUST the tools to do their job and do it well.   Tools such as Octopus Deploy, Team City, Jenkins, TFS, Red Gate, etc.  All of these third-party vendors pour money, time and effort into making them as rock solid as possible.

Trusting the tools is difficult for most database administrators.  We want to see the guts in deployment, making sure dangerous things do not happen to our precious databases and their contents.

Experience

I was a part of a database continuous delivery project at Farm Credit Services of America in conjunction with Alex Yates (B|T) and Bob Walker (B|T), who have both blogged about the experience.  I have also been pushing a similar project at Farm Credit Mid-America.

I’ll admit when the project started up at FCSA, I was that distrusting, skeptical DBA that thought this would never work.   Yes, it was hard.  It was cumbersome and clunky in the beginning.  Then it got easier and the pieces fell into place.  Iterations to delivery changes to the wild got easier.  Quickly, I began to trust the process and how it was going to work.

I highly recommend Bob’s series of blog posts about the process.  You can find them here. Alex also has several excellent posts around the process here.

Summary

In short, I’m a large support of the DevOps movement especially concerning database lifecycle management (DLM) and continuous delivery.  While difficult, there are organizations out there that can help you get there.  I assure you that the end results will be worth every penny.

Time to shed the cloaks of invisibility.  Start talking.

 

© 2017, John Morehouse. All rights reserved.

Contact the Author | Contact DCAC

T-SQL Tuesday #36: What Does the SQL Community Mean to You (Me)?

TSQL Tuesday Logo

T-SQL Tuesday #36: How rad is Community? Rad enough for me to say “rad.”

I hate doing this, but I’m throwing this post together at the last second, as with PASS Summit going on last week, I completely spaced that this was T-SQL Tuesday Week. I blame the fact that I dropped my #TSQL2sDay search column out of TweetDeck last week, but that wouldn’t even have helped, because I spent most of my time on the Surface, but that’s a different story/post altogether. Community is something pretty important to me, so I’m here trying to get this out the door by the deadline (I failed, see below).

T-SQL Tuesday #36 is being hosted by Chris Yates (blog | @YatesSQL), who chose this Community-related topic this month. It’s pretty fitting, considering a good chunk of us have just gotten back from PASS Summit in sunny (yes, really) Seattle this past weekend, where there’s a lot of “community” going on.

Hard to Avoid a Summit Story

Having been one of those that just returned from Summit on Sunday, it’s pretty hard for me to think about this without thinking about last week. I had a couple different things I wanted to say, but I’ve settled on the following, about being a and helping Summit FirstTimers.

Last year at Summit was our first time there. We’ve both been to a fair number of Tech conferences, so it wasn’t all  a new experience for us. This, combined with the fact that we already “Twitter Knew” a fair chunk of people, led us to not opt-in to the organized First Timers networking event (I’m sorry, Tom). Even with the fog machine, rock music intro the FirstTimers had heading into 6ABCD (which was pretty bad-ass), we were OK with this.

We’ve learned a lot about our Community since our first Summit, only a year ago.

This year, Tammy signed up to be an Alumni Mentor for FirstTimers. I was added as kind of an “unofficial” mentor to help her out, instead of having a group of my own, because, when you get right down to it, I’m a huge pansy. I was going to be OK just being there, but not being a mentor myself. That’s scary!

First Timers Sign for Groups 55, 56, 57

Groups on our sign. My rogue group became 57A.

As it turned out, there were a lot more people show up to the FirstTimers networking event than expected. I was standing there with our group’s (and two others’) sign, directing people which table to sit at, depending on which group they were in. At one point, Buck Woody, the guy with the microphone, and therefore the most powerful person in the room (turns out Buck Woody with a microphone is the best, but scariest thing ever), just told everyone to sit down anywhere, because it was taking too long to get everyone in. Next thing I knew, the previously-empty table I was standing next to was full of eager first-timers, along with Tammy’s table, and the other two groups on our sign.

Ohhhhhhhhcrap, I now have my own group of FirstTimers!!!

I had to get over being a pansy real fast. It did help by leading off by telling everyone sitting at my table that I guarantee I was the most scared person siting there. The time we had to sit there and listen to speakers and talk amongst ourselves actually went pretty fast. My group didn’t talk amongst themselves quite as much as I maybe would have liked, but they did have some questions about the conference, which I could answer and help out with. Plus, my head didn’t explode!

Where am I going with this? To me, “SQL Community” is sitting and talking face-to-face with people I’ve never met before…even though doing that scares the living crap out of me. After this experience, I’m sorry that we didn’t do the FirstTimers event last year. I’m going to make up for that in the future, though, by going ahead and volunteering to be a mentor of my own FirstTimers group in future years.

Timezone Fail

Bonus section!

Soooo, this post is late. I forgot that we’re GMT –6 now, because we’ve gone back to Central Standard Time. When I started writing this, I was shooting for 7:00p local. Then, at about 5:59, I realize the truth. Even then, this machine is showing 7:03, so I still failed.

And I’m the guy always crabbing about people saying “EST” when they really mean “EDT” :-(

Contact the Author | Contact DCAC

Video

Globally Recognized Expertise

As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.

And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.

Awards & Certifications

Microsoft Partner   Denny Cherry & Associates Consulting LLC BBB Business Review    Microsoft MVP    Microsoft Certified Master VMWare vExpert
INC 5000 Award for 2020    American Business Awards People's Choice    American Business Awards Gold Award    American Business Awards Silver Award    FT Americas’ Fastest Growing Companies 2020   
Best Full-Service Cloud Technology Consulting Company       Insights Sccess Award    Technology Headlines Award    Golden Bridge Gold Award    CIO Review Top 20 Azure Solutions Providers