A tool to detect infrastructure issues on cloud native AI systems
☆53Sep 18, 2025Updated 7 months ago
Alternatives and similar repositories for autopilot
Users that are interested in autopilot are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Queuing and quota management for AI/ML batch jobs on Kubernetes☆17Jul 16, 2025Updated 9 months ago
- AppWrapper controller for Kueue☆17Apr 11, 2026Updated 2 weeks ago
- llm-d benchmark scripts and tooling☆58Updated this week
- Cloud Native Benchmarking of Foundation Models☆45Jul 31, 2025Updated 9 months ago
- Failure dataset accompanying the paper "How Bad Can a Bug Get? An Empirical Analysis of Software Failures in the OpenStack Cloud Computi…☆10Jun 12, 2020Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Comprehensive Parallel I/O Tracing and Analysis☆52Apr 16, 2025Updated last year
- ☆15Jan 7, 2023Updated 3 years ago
- Predict the performance of LLM inference services☆23Sep 18, 2025Updated 7 months ago
- Knative benchmark suite for Quarkus☆11Feb 5, 2026Updated 2 months ago
- A hierarchical collective communications library with portable optimizations☆38Dec 8, 2024Updated last year
- Real-Time Intrusion Detection and Prevention with Neural Network in Kernel using eBPF☆24Apr 9, 2024Updated 2 years ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆479Updated this week
- hosted by HPC System Test Working Group collaboration☆16Updated this week
- Red Hat Certified optional operator for secondary schedulers☆21Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The MPI parallel MD-Workbench simulates user activities.☆12Jun 23, 2019Updated 6 years ago
- Project to manage Flux tasks needed to standardize kubernetes HPC scheduling interfaces☆28Jan 9, 2026Updated 3 months ago
- This is repository for a I/O benchmark which represents Scientific Deep Learning Workloads.☆23Dec 6, 2022Updated 3 years ago
- Drishti provides I/O insights to help you improve your application's I/O performance.☆23Mar 3, 2026Updated last month
- Auto-tuning for vllm. Getting the best performance out of your LLM deployment (vllm+guidellm+optuna)☆51Mar 17, 2026Updated last month
- Fast and efficient attention method exploration and implementation.☆25Mar 25, 2025Updated last year
- [DEPRECATED] Prometheus exporter for VPA recommendations☆12Aug 22, 2023Updated 2 years ago
- ☆21Updated this week
- Utilities for ROCm Tech Support Log Collections☆13Mar 14, 2026Updated last month
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Measure performance in calculating cosine similarity: C, C++, Go, Python, Perl and Oberon2.☆14Jul 11, 2023Updated 2 years ago
- Tensors and Dynamic neural networks in Python with strong GPU acceleration☆11Jun 2, 2024Updated last year
- Scripts for managing a large H100 cluster and fixing hardware issues to ensure smooth model training.☆324Aug 20, 2024Updated last year
- compiler for fortran stencils using verified lifting,☆20Apr 5, 2022Updated 4 years ago
- A suite of parallel file system tools designed for performance and scalability☆30May 14, 2024Updated last year
- ☆10Dec 10, 2024Updated last year
- KAR: A Runtime for the Hybrid Cloud☆31Sep 17, 2025Updated 7 months ago
- Nabla Containers blog☆12May 26, 2021Updated 4 years ago
- A clean monorepo template for a Python project using uv☆13Jul 8, 2025Updated 9 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Apache OpenWhisk Composer provides a high-level programming model in JavaScript for composing serverless functions☆68Sep 24, 2024Updated last year
- Dynamic execution environments for coupled, thread-heterogeneous MPI+X applications☆21Mar 3, 2025Updated last year
- example.on('end', mustCall(() => {})); Check the callback function is called.☆10Nov 20, 2022Updated 3 years ago
- ☆17Nov 3, 2025Updated 5 months ago
- ☆10Apr 7, 2020Updated 6 years ago
- A curated list of autonomous research systems and tools.☆108Apr 3, 2026Updated 3 weeks ago
- ☆33Oct 31, 2025Updated 6 months ago