SkeltonThatcher / run-book-templateView external linksLinks
Run Book / Operations Manual template for modern software systems
☆718Aug 21, 2019Updated 6 years ago
Alternatives and similar repositories for run-book-template
Users that are interested in run-book-template are comparing it to the libraries listed below
Sorting:
- Questions to assess the operability of software systems☆17Apr 4, 2019Updated 6 years ago
- A set of Grafana dashboards and Prometheus alerts for Kubernetes.☆2,382Jan 28, 2026Updated 2 weeks ago
- Questions to assess the testability of software systems☆20Oct 24, 2019Updated 6 years ago
- A collection templates ported from the SRE Workbook☆42Aug 24, 2018Updated 7 years ago
- The Multi-team Software Delivery Assessment is a simple, easy-to-execute approach to assessing software delivery across many different te…☆210Aug 22, 2023Updated 2 years ago
- A curated list of Site Reliability and Production Engineering resources.☆13,005Aug 28, 2025Updated 5 months ago
- Write tests against structured configuration data using the Open Policy Agent Rego query language☆3,121Updated this week
- Manage application's SLI and SLO's easily with the application lifecycle inside a Kubernetes cluster☆277Jun 11, 2021Updated 4 years ago
- ☆17Sep 27, 2022Updated 3 years ago
- Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.☆9,287Feb 9, 2026Updated last week
- Validation of best practices in your Kubernetes clusters☆3,341Feb 2, 2026Updated last week
- 🦥 Easy and simple Prometheus SLO (service level objectives) generator☆2,412Feb 9, 2026Updated last week
- Kubediff: a tool for Kubernetes to show differences between running state and version controlled configuration.☆1,180Oct 24, 2023Updated 2 years ago
- A simple template for a wiki page for a TVP (thinnest viable platform) - as explained in the Team Topologies book☆66Mar 5, 2021Updated 4 years ago
- PagerDuty's Incident Response Documentation.☆1,035Jan 7, 2025Updated last year
- Jsonnet library for generating Grafana dashboard files.☆1,080Jun 26, 2023Updated 2 years ago
- Terratest is a Go library that makes it easier to write automated tests for your infrastructure code.☆7,874Updated this week
- Backup and migrate Kubernetes applications and their persistent volumes☆9,798Updated this week
- Validate your Kubernetes configuration files, supports multiple Kubernetes versions☆3,220Jan 29, 2026Updated 2 weeks ago
- Tips and tricks for getting through on-call☆404Jun 13, 2020Updated 5 years ago
- A powerful testing tool for Kubernetes clusters.☆1,973Nov 10, 2023Updated 2 years ago
- post mortem tracker☆1,023Oct 1, 2019Updated 6 years ago
- A list of common Disaster Recovery (DR) scenarios for software companies☆34Sep 13, 2021Updated 4 years ago
- Generate documentation from Terraform modules in various output formats☆4,694Dec 18, 2025Updated last month
- Chaos Engineering Toolkit & Orchestration for Developers☆1,996Jul 20, 2024Updated last year
- A collection of postmortem templates☆1,416Jul 12, 2023Updated 2 years ago
- 👀 A Kubernetes cluster resource sanitizer☆6,226Dec 8, 2025Updated 2 months ago
- Guidance on how to make your environment easier to onboard for Web Ops Engineers, SRE's and DevOps Practitioners☆291Jul 18, 2024Updated last year
- Quick and Easy server testing/validation☆5,866May 1, 2025Updated 9 months ago
- ☆66Dec 26, 2022Updated 3 years ago
- Checks whether Kubernetes is deployed according to security best practices as defined in the CIS Kubernetes Benchmark☆7,925Feb 6, 2026Updated last week
- Rules engine for cloud security, cost optimization, and governance, DSL in yaml for policies to query, filter, and take actions on resour…☆5,924Feb 6, 2026Updated last week
- On call alert classification and reporting☆760Dec 6, 2017Updated 8 years ago
- jsonnet library to patch objects loaded from yaml☆17Oct 24, 2023Updated 2 years ago
- Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes☆9,846Feb 6, 2026Updated last week
- Tfsec is now part of Trivy☆6,952Nov 10, 2025Updated 3 months ago
- Monzo's real-time incident response and reporting tool ⚡️☆1,555Mar 20, 2024Updated last year
- A Kubernetes operator for running synthetic checks as pods. Works great with Prometheus!☆2,220Feb 1, 2026Updated 2 weeks ago
- Open specification for defining and expressing service level objectives (SLO)☆1,474Nov 25, 2025Updated 2 months ago