Our vision of the zero-down time enterprise
Lodewijk Bogaards· 3 min read
We’re very pleased to tell you about our vision: to enable organizations to become zero-downtime enterprises. In this video, our CTO and co-founder Lodewijk Boogaards tells you all about it. If you prefer reading, you can find a transcript of the video below. Enjoy!
Hi, I’m Lodewijk Boogaards and I’m the CTO and co-founder of StackState. I’d like to tell you a little bit about the zero-down time enterprise and how we aim for zero.
The journey to becoming a zero-downtime enterprise starts with being able to quickly identify issues and remediating the root cause before it impacts the business. That’s what companies like Vodafone, KPN, NN Bank, Accenture, and dozens more are working toward with StackState.
3 reasons relating root cause to business impact is a huge challenge
1. IT environments are constantly changing
First, dynamic IT environments are constantly changing. Every change introduces risk. When problems occur, it’s difficult – if not impossible – to ID the change that’s causing the problem, ironically because the IT environment looked so different at the moment that the issue started before it started to cause an impact.
2. Information about the IT environment is siloed and incomplete
Second, information about the IT environment, its health, and its relationships and dependencies is siloed and incomplete. It is very common for enterprises to have dozens of monitoring solutions – and that’s likely never to change because there is so much activity going on in these environments.
Those siloes can make it very hard to extract actionable information and to coordinate resolution efforts. End-to-end visibility is always lacking. It is very hard for enterprises to separate the noise from the signal.
3. Enterprises are reacting instead of proacting to IT problems
And third, because those environments are so dynamic and because there are so many tools and silos, most companies are reactive to problems rather than proactive.
Without a reliable way to pinpoint the root cause, performance degradations and outages cost companies millions each month, consuming time and human resources, and negatively impacting reputation and customer satisfaction.
We help enterprises decrease downtime
That’s where StackState comes in. StackState helps enterprises decrease downtime and prevent outages by breaking down the silos between existing monitoring tools and tracking changes in dependencies, relationships, and configuration over time. The system relates these changes to incidents, understanding the precise change that is creating the problem. Our clients realize decreases in mean-time-to-repair, fewer outages, and lower costs associated with incidents.
Lots of monitoring data, still many errors
StackState was born out of a real-life customer problem. A leading bank in Europe was experiencing massive outages in their mobile banking experience and having trouble identifying why, despite the fact that a lot of monitoring data was available.
My co-founder, Mark, and I were tasked with solving it. And in the meantime, we learned that other companies were experiencing similar issues. It was costing companies time, money, and reputation. Solving this problem became StackState’s mission.
The general path of an IT issue
To become a zero-downtime enterprise, you need to understand the anatomy of an issue. If you look at the post-mortems for problems and incidents, they follow a general path – something changes that causes anomalies that trigger alerts, which lead to the incident management process.
StackState is the only solution that can track the changes and link them to the anomalies that lead to alerts that result in incidents, helping companies more precisely go from root cause to business impact.
How StackState works
1. Understanding dependencies between IT components
To do that you need to have a few ingredients. First, you need to break down silos of data within your environment.
What we have built is a unique observability solution that integrates with existing tools, like AWS, Azure, Kubernetes, Splunk, Dynatrace, VMWare, and more, to understand the real-time relationships and dependencies between IT components – from databases and web servers to containers, microservices, and applications. We stitch together real-time topological information from multiple sources and track it across time – which solves the dynamic IT problem. We built our own versioned graph database in order to be able to solve that problem. No other vendor offers this.
2. Creating a data model that represents the entire IT environment
Then, we correlate the topological elements with the telemetry that already exists in the environment. We can get to that data without actually needing to copy it to a single place. This creates a seamless data model that represents a customer’s entire IT environment – solving the silo problem. Again, no other vendor offers this.
Underpinning these capabilities is a unique data model that incorporates topology, telemetry, and traces from existing tools within the environment, and marries it up with our time-based versioned graph database. This is called the 4T Data Model. No other vendor can bring together all of this information, without requiring companies to move their data to our system.
3. Identifying performance anomalies before they become a problem
Finally, we use AI to automatically detect performance anomalies and pinpoint their location in the stack before they have a major impact on business performance – solving the reactive versus proactive problem.
In the space created by the 4T data model StackState is able to track the evolution of a potential problem from root cause to business impact in a highly dynamic IT environment, starting with a change, then the resulting anomalies and finally the associated business impact.
Our fundamentally new approach. And vision.
StackState takes a fundamentally different approach than other vendors. And we believe this is the only approach that will work at scale and under all circumstances.
There are still miles to go on the journey to fully realizing the zero-downtime enterprise. To this end, StackState continues to add support for modern IT experiences, such as cloud native application environments, is maturing our autonomous anomaly detection to further predict and prevent outages before they occur, and is expanding the monitoring and automation tooling we integrate with.
Lodewijk Bogaards· 3 min read