xlab-uiuc / AIOpsLab
A holistic framework to enable the design, development, and evaluation of autonomous AIOps agents.
☆11Updated last week
Alternatives and similar repositories for AIOpsLab:
Users that are interested in AIOpsLab are comparing it to the libraries listed below
- Code repository for SRE agent as part of ITBench☆11Updated last week
- Predict the performance of LLM inference services☆17Updated 9 months ago
- µBench is a tool for benchmarking cloud/edge computing platforms that run microservice applications. The tool creates dummy microservice …☆62Updated 2 months ago
- A series of work towards achieving ACV.☆17Updated last month
- Cloud incidents/failures related work.☆17Updated 3 months ago
- [ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?☆40Updated 2 weeks ago
- A Reading List of System Configuration Management☆56Updated 8 months ago
- Expressive, Easy-to-build, and High-performance Application Networks☆16Updated 2 months ago
- ☆12Updated 2 years ago
- Zodiac: Unearthing Semantic Checks for Cloud Infrastructure-as-Code Programs, SOSP 2024☆12Updated 4 months ago
- Graph based Incident Extraction and Diagnosis in Large-Scale Online Systems (ASE'22)☆9Updated 4 months ago
- This is the repo for remote direct memory introspection.☆20Updated last year
- Simulator for the datacenter, including power, cooling, server and 5G components☆16Updated 2 months ago
- Configuration dependency analysis for cloud software☆23Updated 3 years ago
- ☆11Updated 5 months ago
- Helios Traces from SenseTime☆53Updated 2 years ago
- ☆9Updated 9 months ago
- Serverless optimizations☆51Updated last year
- ☆41Updated 9 months ago
- Testing Configuration Changes in Context to Prevent Production Failures☆30Updated last year
- Burstable Cloud Scheduler☆13Updated 10 months ago
- [ICSE 2023] Differentiable interpretation and failure-inducing input generation for neural network numerical bugs.☆12Updated last year
- Huawei Cloud datasets☆64Updated last week
- Orbit: OS Support for Safe and Efficient Auxiliary Tasks in Applications☆20Updated 2 years ago
- A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems☆163Updated 6 months ago
- Legolas: A Fault Injection Framework for Efficient Exposure of Partial Failures in Distributed Systems☆12Updated last year
- Push-Button End-to-End Testing of Kubernetes Operators and Controllers☆127Updated 2 weeks ago
- Codebase for Autothrottle (NSDI 2024)☆45Updated last year
- ☆21Updated last year
- This repository contains code for the paper: Bergsma S., Zeyl T., Senderovich A., and Beck J. C., "Generating Complex, Realistic Cloud Wo…☆43Updated 3 years ago