AIOps, by definition, deals with data found on the operations side of the house where information is produced every second as a side-effect of provisioning, deploying, and running your applications. IT Operations engineers use the data to observe the state of the applications they are looking after and to detect and resolve any issues that might arise.
Traditionally, the software development teams that build those applications, are far removed from this information. In many organizations, they do not have direct access, but it is far more common that they only see this type of information when there is a crisis at hand.
There are a few situations in which an AIOps solution can really help development teams.
1. Automated Root Cause Analysis
In a complex IT environment, problems seldom come alone. When there is a malfunction somewhere in your application, the fallout of that fault is potentially huge, causing many alarm bells to go off. If the cause does not seem to be related to the infrastructure, Operations teams will need to involve the development team to help troubleshoot.
An AIOps solution helps in these cases to figure out what the root cause of the problem is. In the case of an application failure, development teams will be able to look at their application as a whole and how failing services are related to each other. Through this lens, developers can much more quickly identify the service or services that are the root cause of the problem, rather than a symptom.
Once the failing component is known, AIOps solutions that capture all events in your landscape (events such as deployments, configuration updates, or restarts) can help developers further by giving them awareness about all changes happening in and around the failing service. It is the difference between knowing the error has something to do with the Mortgages application or knowing that the failure was likely introduced by a deployment of Mortgages version 3.15. The technical team is much closer to resolving the situation in the latter scenario.
2. Performance Testing Throughout DTAP
The "Ops" in AIOps does not have to be restricted to your production environments. You can use an AIOps solution throughout your development pipeline (sometimes referred to by the acronym DTAP for Development - Test - Acceptance - Production, the typical environments software goes through from development to production).
When your AIOps solution monitors your entire DTAP pipeline, development teams can track the performance of their system throughout. Track performance on every commit, release or deployment and receive notifications when issues occur. This means you'll be able to discover problems sooner in the cycle and repair them before they reach production.
3. Safety Net for Continuous Delivery
If your development team is working with continuous delivery (way to go guys!), there are more benefits to using an AIOPs solution. In a continuous delivery environment, deployments happen very frequently. This applies not only to the service your team is working on, but also to your dependencies. In an environment that is so dynamic, it is important to keep track of all deployments and associated version changes so you can quickly spot version incompatibilities if and when they occur. This blog post describes this in more depth .
Environments that are this dynamic can suffer from stability problems, affecting the users of your service, no matter how much you've invested in (automated) testing. An AIOps solution can help mitigate this problem. By deploying a canary release to your users and monitoring its health, you can spot problems early and rollback any changes that degrade performance or availability. Read this blog for an example .
If things do go wrong, AIOps solutions that support time travel will be a developer's best friend. Instead of painstakingly reproducing errors in the lab or, worse, troubleshooting on a live system where the problem is manifesting, these products allow you to see your entire landscape and related metrics at the time of the failure. That means all your services, databases and webservers. At any point in time. Move backward to see how the problem started or forwards in time to see how it progressed. All while Operations has already rolled back the change and users are happily using your service.
4. Predicting Software Problems
The promise of AIOps goes further than that. Using past and current data, AIOps solutions can make predictions into the future of how certain metrics will progress. While this sounds like science fiction, it is actually possible and has the potential to
change the way we work dramatically.
Imagine a new version of your software goes live with a nasty bug in it that makes the service perform slower and slower as the data size grows. This common issue is hard to find and sneaks up on most teams, making performance worse and worse and finally ending in a crash.
How would an AIOps solution with predictive analytics help?
First of all, the AI would notice an upward trend in response times, way before a human would. Projecting into the future, the AI would conclude that the response time will move outside of acceptable bounds in the next 30 minutes and sounds an alarm. The engineers are alerted and can investigate before an issue occurs.
There is even a step beyond that.
Linking the predicted issue with response times to deployments of your software means that the AI can suggest a likely cause of the issue as well. This would be like learning about a bug in your code before it impacts any of your users. Now what developer wouldn't want that?
StackState builds an AIOps product that helps developers speed up software delivery with confidence. Download our 'Guide to AIOps' to get a better understanding of the benefits of AIOps.