coreweave / kubernetes-cloud
Getting Started with the CoreWeave Kubernetes GPU Cloud
☆66Updated 2 weeks ago
Related projects: ⓘ
- Module, Model, and Tensor Serialization/Deserialization☆175Updated 3 weeks ago
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆66Updated last week
- Running Stable Diffusion with Metaflow☆33Updated 7 months ago
- markdown docs☆62Updated this week
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆62Updated this week
- GPU plugin to the node feature discovery for Kubernetes☆287Updated 3 months ago
- A top-like tool for monitoring GPUs in a cluster☆80Updated 7 months ago
- Deploy your HPC Cluster on AWS in 20min. with just 1-Click.☆50Updated 3 weeks ago
- NVIDIA NCCL Tests for Distributed Training☆59Updated last month
- Pipeline is an open source python SDK for building AI/ML workflows☆124Updated this week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆45Updated 5 months ago
- GPU Environment Management for Visual Studio Code☆35Updated last year
- Google TPU optimizations for transformers models☆62Updated this week
- NVIDIA device plugin for Kubernetes☆42Updated 7 months ago
- MIG Partition Editor for NVIDIA GPUs☆163Updated this week
- Argoflow has been superseded by deployKF☆138Updated last year
- BIG: Back In the Game of Creative AI☆25Updated last year
- JobSet: a k8s native API for distributed ML training and HPC workloads☆133Updated this week
- Infrastructure as code for GPU accelerated managed Kubernetes clusters.☆45Updated 4 months ago
- The NVIDIA Driver Manager is a Kubernetes component which assist in seamless upgrades of NVIDIA Driver on each node of the cluster.☆33Updated this week
- Karras et al. (2022) diffusion models for PyTorch☆19Updated 3 months ago
- ☆43Updated 3 months ago
- CUDA checkpoint and restore utility☆193Updated 5 months ago
- 🐳 | Dockerfiles for the RunPod container images used for our official templates.☆141Updated 3 weeks ago
- AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads☆200Updated 9 months ago
- ☆15Updated last month
- Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes☆227Updated this week
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆121Updated this week
- A Slurm cluster for Kubernetes☆36Updated last month
- ☆21Updated this week