Mostly a discussion of Chronosphere with a dash of data from 451 Research’s Voice of the Enterprise: Cloud Native, Observability 2022 survey. I really like the “what we thought we were going to get vs what we actually got” question format.
Not a survey, but I like the Chronosphere team and that question format, so I included it.
A recent Market Insight Report from 451 Research introduces our Chronosphere observability platform and capabilities and acknowledges the need for them in today’s increasingly cloud native world.
But the report, titled “Chronosphere aims to tame runaway observability data and costs,” also raises the question: Is there long-term room in the market for independent, scalable and profitable observability players like us? Or will legacy application performance monitoring (APM) vendors eventually barrel in and seize control?
451 Research surveys reveal that 36% of businesses have already deployed observability tools in production. Another 18% are currently investigating them in proof-of-concept initiatives.
We think the former, naturally. After all, our founders, CEO Martin Mao and CTO Rob Skillington, cut their teeth at Uber where they led the observability team and created M3, an open source, scalable, remote storage time series database.
Although our solution was engineered from the ground up to work in large-scale, microservices-based applications running on containers, we can also monitor monoliths running on non-containerized environments, meaning that the mixed environments that prevail in many enterprises can be monitored with just one tool like ours. This is especially important given the finding in 451 Research’s analysis that organizations are seeking to reduce, not expand, their toolsets for observability.
Why Observability?
The world is going cloud native for speed, scale, and efficiency. Cloud native architectures enable faster software development life cycles. Value can be realized more incrementally and much more quickly. But having visibility into smaller and distributed, interdependent pieces becomes a necessity once organizations are deploying at scale. There are simply too many moving parts, and too much can go amiss, not to have transparency and control over such environments.
Observability is defined by 451 Research, part of S&P Global Market Intelligence, as “the ingestion, storage, and analysis of structured event data for problem detection and resolution.” Observability platforms such as ours allow engineers to rapidly zero in on contextualized data to diagnose issues in a cloud native environment.
Suffice it to say, traditional APM solutions simply cannot achieve observability in a cloud native world. There are three basic problems these legacy solutions run into:
- Scalability: Cloud native environments broadcast a massive amount of data — somewhere between 10 and 100 times more than traditional virtual machine (VM)-based environments.
- Flexibility: Cloud native applications and the container-based infrastructure they run on are ephemeral. They live only for the lifetime of a deployment. Given today’s practices, those lifetimes tend to be very short.
- Reliability: It’s impossible to guarantee a 99.9% uptime service-level agreement (SLA) if your tool to measure it isn’t itself available more than 99.9% of the time. Most of today’s APMs can’t achieve that “three nine’s” level of availability.
These challenges with scalability, flexibility, and reliability are the primary drivers in an issue that I call the “expectation gap” of observability tools. Companies expect their observability tools to give them faster problem detection (mean time to detect), faster problem resolution (mean time to recover), and improved responsiveness. But for companies that are using an APM or IT infrastructure monitoring solution that was recently rebranded with a shiny new “observability” logo, their solution will likely fall short.
According to 451 Research data, the expectation gap for observability tools is quite large: 20 percentage points for faster problem detection, 15 percentage points for faster problem resolution, and 16 percentage points for improved responsiveness.
Why Chronosphere?
In the report, 451 Research lays out the Chronosphere value proposition succinctly:
“The company’s SaaS platform combines the benefits of open source cloud monitoring with customer inputs to cut through the noise of undifferentiated metrics and traces.”
451 Research noted that self-managed OSS solutions based on Prometheus or OpenTelemetry can work well when capturing metrics and traces from containerized environments. But they have many limitations. For starters, more workers are required to support them as applications grow, causing costs to rise.
Availability and resiliency are also “significant issues” 451 Research said, as organizations try to scale. The high number of interdependencies results in a higher data cardinality of data, and a more urgent requirement to connect infrastructure to applications based on business metrics. That’s why, even when operating at the same scale as a VM-based deployment, a cloud native application will have a significantly higher monitoring bill, concluded 451 Research.
At Chronosphere, our objective is to ease the pain of dealing with accelerating data growth — and the cost of observability data — using our control plane, which relies on aggregation among other techniques to come up with retention and resolution strategies that enable our customers to only pay for the data they absolutely need to keep.
Plus, we have a different take on what’s important. As 451 Research wrote,
“The company [Chronosphere] believes its differentiation is on addressing the root cause of customer-facing issues, and it does not have a hyper focus on metrics, logs, and traces (this is the data) — the main event is introspecting applications to understand what is going on.”
This indeed is what we believe. We focus on outcomes, not the numbers themselves.
In practice, this means using our Chronosphere tools to focus engineers on coming up with answers to three key questions:
- How quickly — before or after a negative customer or employee experience — am I notified when there is a problem?
- How easily and quickly can I triage the problem and understand its impact?
- How do I find the underlying cause so I can fix the problem?
By focusing on the outcome — how fast we can right this problematic customer or employee experience — rather than the inputs (metrics, logs, and traces), our platform is becoming known for being able to dramatically reduce the time to resolution. This is important for businesses that are increasingly dependent on cloud native applications.
We Designed and Built Our Observability Platform for Cloud Native Momentum
Cloud native observability has turned out to be a disruptive force to the traditional APM space. 451 noted that “there is also interest in tooling that is designed specifically with cloud native technologies in mind as enterprises grapple with the challenges that complex, cloud native apps present.” It’s significantly different architecturally and unique because of the requirements of cloud native environments. Traditional APM vendors can’t realistically re-architect their solutions to make them work in this new world.
APM may be a tool enterprises need for VM-based applications, but observability is what will give businesses with cloud native environments a competitive advantage. Another way to think about it is that although today’s APM tools are designed for the cloud, observability platforms are designed for cloud native. That’s a big difference. As organizations increasingly turn to cloud native architectures, observability platforms like our Chronosphere platform will inevitably surpass APM solutions in use and effectiveness in delivering business outcomes.