The Right Stack
  • Cloud, Developer, AI, and Security Survey Collection
  • Blog

The Verica Open Incident Database (VOID)

Vendor Sponsor
Verica
Research Published
December 22, 2022
Teaser blog
https://www.verica.io/blog/void-2022-report-now-available/
Link to research
https://www.thevoid.community/report
Description

Ooh this one is cool. Incidents - most of what you thought you knew is probably wrong. Root cause analysis? Not meaningful. MTTR? Not a great way to measure distributed systems because averages.

Demographic or Methodology comments

Topic Tags
SecurityCIO / ITDevOpsIncidents
Sample
VOID Community Incident ReportsVendor Customers
Demographics
Sample Bias

Companies willing to create and share incident reports

Hot Take

Everything except for DNS. It was DNS

Created time
Mar 24, 2023 5:10 AM
Directory name

The Rightstack Research DB

The Verica Open Incident Database (VOID) makes public software-related incident reports available to everyone, increasing understanding of software-based failures in order to make the internet a more resilient and safe place. After scrutinizing nearly 10,000 incidents, one thing is crystal clear: Resilience saves time. Taking the time to understand how to better respond when something green turns red—learning from the people, the processes, and the systems—will make your next incident smoother.

image

Get the 2022 reportLoading...

Success! You'll be added to the VOID Newsletter, and after confirming your email address you'll receive a second email with the 2022 report download link. Please be patient, it can sometimes take 5-10 min for the report to come through.

image

Duration Isn't Cut and Dry

Duration of incidents conveys little meaning about the incidents themselves, in part because it can be very tricky to attribute when incidents start or stop.

image

It's Time To Retire MTTR

Mean Time to Resolve (MTTR) isn’t a viable metric for the reliability of complex software systems for a myriad of reasons, particularly because averages of duration data lie.

image

Duration and Severity Aren't Related

We found that duration and severity are not correlated—companies can have long or short incidents that are very minor, existentially critical, and nearly every combination in between.

image

Root Cause Analysis Is On The Decline

Despite adding four times the number of incidents in 2022, the number of RCA-based reports didn't increase proportionally. We even saw a move away from RCA in large enterprise organizations, as they embrace more in-depth analyses.

What People Are Saying About the VOID

The VOID report challenges the “Old View” in what many technology organizations deem as the gold standard for incidents, such as: duration of incidents, MTTR, and Root Cause Analysis. Instead, we can embrace a “New View” that includes learning from incidents beyond just fixing them, deeper and broader incident analysis, humans as the superpower of systems, and an increased focus on successes versus failures when analyzing incidents.

Chad Todd

SRE Manager, Crowdstrike

If you aren't recording and publishing incidents because you want to look good, then you are more likely to have a much bigger failure. This report raises some interesting questions, how can we measure near-misses, and can we find a better metric than Mean Time To Repair (MTTR) given the complex partial failure modes we see? I encourage everyone to publish more, include near misses in your incident reports, and to help everyone else build a safer world as a result.

Adrian Cockcroft

Partner, OrionX & Tech Advisor

The VOID report marks a remarkable advancement in how our community will look at and fix incidents moving forward. Upon seeing the emerging key findings of the report, Jeli was excited to support this research across these large datasets. Through extrapolating the key findings of the report, we are all able to build more resilient systems with greater collaboration.

Nora Jones

CEO, Jeli

The VOID Report is one of those rare and delightful moments of active thought. It takes a given subject matter, in this case claims about incidents in software, as serious and worthy of in-depth consideration. And through a close examination it finds that something doesn't quite make sense. That critique provides an opening for thought, and the sloughing off of received dogma. It's a wonderful example of critical thinking.

Technical CSM, Honeycomb

As SREs we spend a lot of time thinking about incidents, trying to learn from them and understand our world better. The VOID report gives us well-researched data so we can see clearer, and help our organizations learn from our peers across the industry.

Senior Principal Engineer, Equinix

The VOID report represents a great step forward for the IT industry. It is both a demonstration that numerous organizations are transforming their approach to post-incident learning, and an inspiring call for others to recognize the importance of this New Way of looking at incidents. I love the rigorous critique of MTTR, as well as the practical alternatives suggested by the report.

David Leigh

Distinguished Engineer, IBM

Reading that companies are ditching Root Cause Analysis in the same report as we get a fantastic analysis of MTTR fallacies really gave me, a professional pessimist, optimism for the future.

Clint Byrum

Staff Engineer, Spotify

If you loved Accelerate and the DORA Report, this will be right up your alley: a long-overdue, open-sourced data dump of real outages. Yours. Ours. Companies big and small have contributed their outage reports to seed this repo of what really happens when things goes sideways.

Honeycomb

The VOID report is the first industry-wide analysis of the state of software reliability today—in fact, it is the closest thing we have to a 'State of the Union' address. Everyone who designs and operates software systems should read it.

Engineer, Stanza Systems

The VOID project is one of the most significant steps we can take as an industry to improve our operations and safety. This report sets up solid bases for many organizations and practitioners to turn their outage review practices towards more impactful and learning-centric views.

Staff SRE, Honeycomb

The VOID report is an outstanding broad view of patterns in incidents across many organizations. I'm looking forward to the database growing and lending itself to even more research and insights.

Štěpán Davidovič

Senior Staff SRE, Google

The VOID report challenges the “Old View” in what many technology organizations deem as the gold standard for incidents, such as: duration of incidents, MTTR, and Root Cause Analysis. Instead, we can embrace a “New View” that includes learning from incidents beyond just fixing them, deeper and broader incident analysis, humans as the superpower of systems, and an increased focus on successes versus failures when analyzing incidents.

Chad Todd

SRE Manager, Crowdstrike

If you aren't recording and publishing incidents because you want to look good, then you are more likely to have a much bigger failure. This report raises some interesting questions, how can we measure near-misses, and can we find a better metric than Mean Time To Repair (MTTR) given the complex partial failure modes we see? I encourage everyone to publish more, include near misses in your incident reports, and to help everyone else build a safer world as a result.

Adrian Cockcroft

Partner, OrionX & Tech Advisor

The VOID report marks a remarkable advancement in how our community will look at and fix incidents moving forward. Upon seeing the emerging key findings of the report, Jeli was excited to support this research across these large datasets. Through extrapolating the key findings of the report, we are all able to build more resilient systems with greater collaboration.

Nora Jones

CEO, Jeli

The VOID Report is one of those rare and delightful moments of active thought. It takes a given subject matter, in this case claims about incidents in software, as serious and worthy of in-depth consideration. And through a close examination it finds that something doesn't quite make sense. That critique provides an opening for thought, and the sloughing off of received dogma. It's a wonderful example of critical thinking.

Nick Travaglini

Technical CSM, Honeycomb

As SREs we spend a lot of time thinking about incidents, trying to learn from them and understand our world better. The VOID report gives us well-researched data so we can see clearer, and help our organizations learn from our peers across the industry.

Amy Tobey

Senior Principal Engineer, Equinix

The VOID report represents a great step forward for the IT industry. It is both a demonstration that numerous organizations are transforming their approach to post-incident learning, and an inspiring call for others to recognize the importance of this New Way of looking at incidents. I love the rigorous critique of MTTR, as well as the practical alternatives suggested by the report.

Distinguished Engineer, IBM

Reading that companies are ditching Root Cause Analysis in the same report as we get a fantastic analysis of MTTR fallacies really gave me, a professional pessimist, optimism for the future.

Staff Engineer, Spotify

image
The Right Stack

Vendor research collection

Linkedin

Threads

RSS Feed