Self-Driving Anomaly Detection

Imagine driving on the freeway in a (partially) self-driving car like a Tesla. While you drive the car, you come across things you would expect like trees, lampposts and other cars but also things that don't belong there like trash floating around. Meanwhile, radars and sensors in the car are working hard to make sure you don't crash because of these things. If you see the freeway as your fast-changing IT environment, then all the things that don't belong there are anomalies. To make sure you don't crash because of anomalies and let it affect your business you need good detection. Anomaly detection helps to find incidents in your fast-changing IT environment and provides insight into the Root Cause. It also directs the attention of IT operators to interesting parts of the IT environment. However, Anomaly detection is only efficient when it's easy to configure and scales to large IT environments and runs on all data streams, at StackState we call this 'Self-Driving Anomaly Detection'. Read this blog to learn more about this concept and how it can help your business.

Why easy configuration?

If Anomaly detection requires human intervention to configure than the machine learning behind it doesn't provide much value. Because the whole point of having machine learning is that you don't need human intervention. Therefore, anomaly detection shouldn't require configuration but needs to 'drive itself'. There are three reasons to support this: 

  1. No expertise:

    even if companies decided that they wouldn't mind configure the anomaly detection themselves there's a big chance that they couldn't. Because many companies lack the expertise to configure machine learning algorithms.

  2. Labor intensive:

    if companies do have the expertise, then it will still cost them a lot of manual effort to configure machine learning algorithms that aren't reusable in a lot of cases. The sheer amount and variety of data make manual configuration unfeasible.

  3. Brittle configurations:

    the dynamism of IT infrastructure makes all the configuration brittle - they have to be constantly updated. 

"Psst...renowned global research company Gartner has listed all Artificial Intelligence for IT Operations (AIOps) vendors in their New Market Guide. Download your free report right here!"

Why scale to large IT environments and run on all data streams?

Good Anomaly Detection should be able to scale to massive IT environments and run on all data streams. The main reason for this is because it's hard to fully specify a subset of streams. You can run anomaly detection on Service-Level Agreements (SLAs) and business Key Performance Indicators (KPIs). However, operators tend to closely monitor things that are in the higher-level streams like SLAs and KPIs already. Or have high-quality health checks or synthetic monitoring running on the higher levels. Therefore, anomaly detection would provide more additional value if it also ran on lower level streams. These streams are plentiful and volatile in nature. Disrupting anomalies can happen anywhere throughout your IT environment. Because you want to be able to detect them, it's important to have your anomaly detection running on all data streams.

The power of Self-Driving Anomaly Detection

As explained here above, good anomaly detection needs easy configuration. It's also important that it can scale to large IT environments and run on all data streams.

StackState's Self-Driving Anomaly Detection doesn't need manual configuration. It automatically finds the right machine learning algorithm for each data stream using AutoML. This is a collection of anomaly detection algorithms, the semantics of the data, correlations among data streams, user feedback, and historical IT incidents. Self-Driving Anomaly Detection tries different ways to detect anomalies and finds the one that detects the most meaningful anomalies making sure it doesn't have false positives.

Self-Driving Anomaly Detection scales to large environments by prioritizing streams based on its knowledge of the IT environment. The streams with the highest priority will then be examined first. This priority of streams is computed by a machine learning algorithm that learns to maximize the probability that it will prevent an IT issue. It does this based on which streams are intrinsically important such as KPIs and SLAs, the ongoing and historical issues and the relations between streams among other factors. This way Self-Driving Anomaly Detection can operate in large environments by allocating the attention where it matters the most.

Do you want to learn more about Self-Driving Anomaly Detection and StackState? Get in touch with us and send your question! We're happy to share our knowledge.

About StackState

StackState is the leading monitoring and AIOps platform for hybrid IT. The platform combines and analyzes metrics, logs, events, and data beyond typical monitoring data, like Google Analytics, CMDBs, CI/CD tools, service registries, automation, and incident management tools. The 4T Data Model® is the core of StackState's monitoring and AIOps platform and the main driver for all real-time monitoring, automation, and predictive capabilities. It combines big data, artificial intelligence, and topology visualization to instantly pinpoint the root cause of (predicted) incidents and improve business alignment. The platform helps organizations make better decisions faster and avoid high severity outages while utilizing their current IT investments. StackState's growing customer list encompasses a range of industries — from finance, telecom, to managed service providers — and includes global enterprises like IBM Global Services, Vodafone, KPN Telecom, and ABN AMRO Bank as well as local innovators, such as NS International and Schuberg Philis. Could you use this? Book a guided tour with one of our StackState experts and discover how StackState makes your life easier.


BlogAIOpsBlogMonitoringProductRoot Cause Analysis