If you're accustomed to running software in production, you know that every minute counts when there's a disruption. However, not every issue is obvious enough to immediately find and remediate. That can be a big obstacle to overcome, which is where StackState's Kubernetes remediation guides come into play. They contain expert knowledge that guides you step by step to understand the issue, enabling swift remediation. This blog talks about how they work and how they help you find and fix problems faster than ever.
What are StackState’s Kubernetes remediation guides?
Remediation guides are step-by-step playbooks that lead software engineers to the cause of an issue for prompt remediation. StackState comes with multiple built-in Kubernetes remediation guides designed to solve common problems. Site reliability engineers and platform engineers can also extend these guides, capturing their knowledge to benefit all software engineers who are deploying to Kubernetes.
Key features of the remediation guides include:
Easy-to-understand explanations of the issue and its significance
Step-by-step guidance to detect the cause of the issue
Code snippets as remediation suggestions for use in config files, for example
Deep links into other screens for smooth transitions between components
Ability to pin and unpin a remediation guide. Pinning ensures the guide is always present in the right panel, so the remediation steps are easy to follow as you investigate components in other screens.
Ability to extend remediation guides in YAML
13+ pre-packaged guides for rapid remediation of common issues
The guides are linked to out-of-the-box monitors and can be triggered a by specific monitor.
Why are they good?
Remediation guides offer several benefits for engineers:
Pre-configured: No need to start from scratch; they are part of the product and available as soon as you switch it on. Having them available out of the box accelerates understanding and covers the most important issue patterns right away.
Extensible: The remediation guides can be extended with code snippets or deep links to other components, facilitating troubleshooting across multiple components.
Organized knowledge sharing: In many organizations, only a few people are true Kubernetes experts, while others are strong in development and simply want to deploy to Kubernetes. To avoid bottlenecking platform teams and SREs, remediation guides capture and share knowledge for repeated use.
Centralized practices: No more runbooks and remediation hints scattered across Notion, Dropbox or Confluence; with StackState, maintain your remediation guides as part of your monitoring solution in a central location.
Focused problem solving: Remediation guides are directly related to specific monitoring signals, ensuring that the most relevant guide is shown to the user and that they see the right steps to execute to fix that particular issue.
What do you need to do to get them?
StackState's remediation guides are an integral part of our SaaS offering and are automatically provided with every level of the product. All 13+ guides are pre-configured and become visible as soon as your data shows unhealthy patterns. Monitors look for signals to reach a particular threshold, and when that happens, they trigger the right remediation guide to appear and provide the necessary remediation steps immediately.
Who can benefit from the remediation guides?
Software engineering teams: With the increased knowledge and insight provided by remediation guides, software engineers’ troubleshooting capabilities will dramatically improve, They can independently investigate system issues without bringing in other team members, solve problems faster and save time. In addition, the guides contain a lot of explanation that will prevent teams from making the same mistake again.
Platform engineering teams: Platform engineers can create new or extend existing guides. This structured way of sharing their expertise troubleshooting similar issues ultimately reduces their workload, allowing them to focus on improving the platform.
SRE teams: Often chartered to guide engineering teams in better observability practices, SREs can also contribute to the remediation guides and improve the troubleshooting process. Teams can define monitors and remediation guides once, and then all engineers can benefit as they are applied automatically to all future pods.
Under the hood
Here are few more details about StackState’s Kubernetes remediation guides:
Every monitor has a remediation guide attached, making the guides specific and useful for resolving any issue.
Guides are a mixture of text, code snippets and deep links to other components, depending on the detected issue. Over time, charts and automated validation will also become part of the guides.
Important items can be pinned in the right panel for easy access. By opening a monitor at the top right, the button "Add to pinned items" will be shown. Clicking it adds the guide to the right panel, where it can be used for further navigation. The < > symbol in the right panel will enlarge the monitor drawer again.
Are you ready to experience the power of StackState's remediation guides? Try them yourself!
Try out our Kubernetes remediation guides in our playground, featuring a Sock Shop demo application for you to explore and discover the benefits of streamlined Kubernetes troubleshooting. With StackState's Kubernetes remediation guides, you'll enjoy faster, more efficient troubleshooting and spend less time navigating through multiple tools to find out what the issue is. Best of all, you’ll know exactly how to remediate the problem.