A role-playing game for incident management training
☆189Feb 27, 2024Updated 2 years ago
Alternatives and similar repositories for wheel-of-misfortune
Users that are interested in wheel-of-misfortune are comparing it to the libraries listed below
Sorting:
- A collection of postmortem templates☆1,418Jul 12, 2023Updated 2 years ago
- Postmortem metadata from danluu/post-mortems.☆39Mar 2, 2026Updated last week
- A collection templates ported from the SRE Workbook☆42Aug 24, 2018Updated 7 years ago
- List Kubernetes objects in a problematic state☆60Aug 26, 2021Updated 4 years ago
- A sample of major outages and incidents☆18Jul 27, 2019Updated 6 years ago
- Much resources. So log. Wow.☆24May 17, 2014Updated 11 years ago
- Linux Metrics Workshop☆11Jun 30, 2020Updated 5 years ago
- Fixed it, so that years actually make sense, instead of AD and BC nonsense☆14Mar 21, 2025Updated 11 months ago
- Calculate how much downtime should be permitted in your Service Level Agreement or Objective☆69Feb 14, 2021Updated 5 years ago
- DEPRECATED Collection of python scripts to run failure injection on AWS infrastructure☆93Oct 18, 2023Updated 2 years ago
- Administration and troubleshooting tools inside a docker container☆10May 2, 2023Updated 2 years ago
- A collection of Twilio SRE's Gameday Templates☆140Oct 13, 2020Updated 5 years ago
- ☆48Nov 17, 2019Updated 6 years ago
- Calm monitoring extension for the OpenTelemetry Collector☆14Aug 11, 2025Updated 6 months ago
- Portable Activity Timeline that draws the Timeline based on data given in JSON or CSV format. By clicking on any activity a detailed moda…☆12Apr 6, 2023Updated 2 years ago
- Terraform Automation and Collaboration tools (TACOS) pricing calculator☆17Aug 14, 2023Updated 2 years ago
- Open specification for defining and expressing service level objectives (SLO)☆1,478Nov 25, 2025Updated 3 months ago
- Text Match Cut Video Generator Web App☆36Feb 19, 2026Updated 2 weeks ago
- Easy setup a service level objective using prometheus☆137Jan 10, 2026Updated 2 months ago
- An AWS lambda function that grantsss S3 permissionsss at ssscale.☆14Jan 4, 2018Updated 8 years ago
- ☆12Updated this week
- Public docs and templates for managing incident lifecycle.☆30Jan 23, 2017Updated 9 years ago
- SLO Generator computes SLIs, SLOs, Error Budgets and Burn Rates from supported backends, then exports an SLO report to supported targets.☆557Feb 25, 2026Updated last week
- ☆17Sep 9, 2020Updated 5 years ago
- GitOps 101☆16Nov 5, 2019Updated 6 years ago
- Convert libvirt-QEMU-save (LQS) files to raw memory files☆14Sep 22, 2018Updated 7 years ago
- A curated list of Site Reliability and Production Engineering Tools☆1,428Feb 9, 2026Updated last month
- secrets-helper helps you use AWS Secrets Manager to secure the use of CLI tools☆18Jun 9, 2020Updated 5 years ago
- Manage Helm CRDs with Terrarom☆16Jul 16, 2024Updated last year
- Create an incident response triage toolkit for use with Windows or Linux.☆18Jun 14, 2020Updated 5 years ago
- A curated list of awesome Site Reliability and Production Engineering resources.☆92Mar 19, 2023Updated 2 years ago
- Run k6 with extensions☆17Jan 21, 2025Updated last year
- A curated list of awesome postmortem posts.☆20Nov 11, 2021Updated 4 years ago
- ☆18Jun 30, 2022Updated 3 years ago
- Tools for Chaos Engineers☆42Mar 19, 2018Updated 7 years ago
- SLOs, Error windows and alerts are complicated. Here an attempt to make it easy☆133Mar 4, 2025Updated last year
- Status page generator based on prometheus metrics☆25Feb 19, 2026Updated 2 weeks ago
- An end to end example of implementing SLOs with prometheus, grafana and Go.☆142May 30, 2019Updated 6 years ago
- Watch a process and execute specified command for notification when finished☆27Jan 5, 2019Updated 7 years ago