A tool to detect infrastructure issues on cloud native AI systems
☆53Sep 18, 2025Updated 8 months ago
Alternatives and similar repositories for autopilot
Users that are interested in autopilot are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- AppWrapper controller for Kueue☆17Updated this week
- llm-d benchmark scripts and tooling☆58Updated this week
- Cloud Native Benchmarking of Foundation Models☆45Jul 31, 2025Updated 9 months ago
- Failure dataset accompanying the paper "How Bad Can a Bug Get? An Empirical Analysis of Software Failures in the OpenStack Cloud Computi…☆10Jun 12, 2020Updated 5 years ago
- ☆15Jan 7, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Holistic job manager on Kubernetes☆117Feb 20, 2024Updated 2 years ago
- Predict the performance of LLM inference services☆23Sep 18, 2025Updated 8 months ago
- A hierarchical collective communications library with portable optimizations☆38Dec 8, 2024Updated last year
- DXT Explorer is an interactive web-based log analysis tool for Darshan DXT logs.☆17Feb 19, 2026Updated 3 months ago
- Solution Service Architecture☆26Jun 5, 2024Updated last year
- Real-Time Intrusion Detection and Prevention with Neural Network in Kernel using eBPF☆25Apr 9, 2024Updated 2 years ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆481Updated this week
- hosted by HPC System Test Working Group collaboration☆17Apr 30, 2026Updated 3 weeks ago
- Red Hat Certified optional operator for secondary schedulers☆21Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The MPI parallel MD-Workbench simulates user activities.☆12Jun 23, 2019Updated 6 years ago
- Project to manage Flux tasks needed to standardize kubernetes HPC scheduling interfaces☆29Jan 9, 2026Updated 4 months ago
- Drishti provides I/O insights to help you improve your application's I/O performance.☆24Mar 3, 2026Updated 2 months ago
- Auto-tuning for vllm. Getting the best performance out of your LLM deployment (vllm+guidellm+optuna)☆52Mar 17, 2026Updated 2 months ago
- Augmented Dickey-Fuller implementation in Go☆12Mar 15, 2019Updated 7 years ago
- [DEPRECATED] Prometheus exporter for VPA recommendations☆12Aug 22, 2023Updated 2 years ago
- Fast and efficient attention method exploration and implementation.☆25Mar 25, 2025Updated last year
- Snapped is a parallel program snapshotter designed for debugging deadlocks and crashes in programs. It acts as a wrapper around the GDB M…☆11Aug 26, 2024Updated last year
- ☆21Apr 25, 2026Updated 3 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Utilities for ROCm Tech Support Log Collections☆14Mar 14, 2026Updated 2 months ago
- compiler for fortran stencils using verified lifting,☆20Apr 5, 2022Updated 4 years ago
- A suite of parallel file system tools designed for performance and scalability☆30May 14, 2024Updated 2 years ago
- Gridsim simulator☆12May 12, 2017Updated 9 years ago
- Code and other materials for the S2I2 Software Summer School☆12Mar 11, 2017Updated 9 years ago
- OCM/ACM Ansible Collection☆18Jan 19, 2026Updated 4 months ago
- Dynamic execution environments for coupled, thread-heterogeneous MPI+X applications☆22Mar 3, 2025Updated last year
- The link to the website is at☆14Aug 12, 2015Updated 10 years ago
- example.on('end', mustCall(() => {})); Check the callback function is called.☆11Nov 20, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆17Nov 3, 2025Updated 6 months ago
- ☆10Apr 7, 2020Updated 6 years ago
- ☆12Aug 27, 2022Updated 3 years ago
- ☆33Oct 31, 2025Updated 6 months ago
- Simulation infrastructure and validation of Cori☆13Mar 22, 2022Updated 4 years ago
- Pytorch implementation for the pilot study on the robustness of latent diffusion models.☆12Jun 20, 2023Updated 2 years ago
- 12 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/☆10Nov 16, 2023Updated 2 years ago