NascentCore / 3kLinks

Orchestrating many small GPU clusters for running serverless GPU workloads

☆15

Alternatives and similar repositories for 3k

Users that are interested in 3k are comparing it to the libraries listed below

Sorting:

PKUHPC / CraneSched
A distributed scheduling system for HPC and AI workloads
☆120Updated this week
viniciusferrao / cloysterhpc
Cloyster HPC is a turnkey HPC cluster solution with an user-friendly installer
☆10Updated last month
openshift / ib-sriov-cni
InfiniBand SR-IOV CNI
☆14Updated last week
coreweave / nccl-tests
NVIDIA NCCL Tests for Distributed Training
☆118Updated this week
NVIDIA / k8s-operator-libs
A collection of useful Go libraries to ease the development of NVIDIA Operators for GPU/NIC management.
☆24Updated last week
OpenCSGs / llm-inference
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deploy…
☆86Updated last year
kubeagi / arcadia
A diverse, simple, and secure all-in-one LLMOps platform
☆109Updated last year
NVIDIA / topograph
A toolkit for discovering cluster network topology.
☆74Updated this week
OpenCSGs / csghub-charts
This repository provides installation scripts and configuration files for deploying the CSGHub instance, includes Helm charts and Docker…
☆16Updated last week
baidubce / terraform-provider-baiducloud
Terraform provider for BaiduCloud
☆24Updated this week
k8snetworkplumbingwg / rdma-cni
RDMA CNI plugin for containerized workloads
☆58Updated this week
NVIDIA / go-dcgm
Golang bindings for Nvidia Datacenter GPU Manager (DCGM)
☆137Updated last week
stackhpc / slurm-k8s-cluster
A Slurm cluster for Kubernetes
☆65Updated last year
HFAiLab / hai-platform-studio
配合 HAI Platform 使用的集成化用户界面
☆53Updated 2 years ago
guilbaults / infiniband-exporter
Prometheus exporter for a Infiniband Fabric
☆67Updated last year
FlyAIBox / dcu-in-action
国产加速卡-海光DCU实战（大模型训练、微调、推理等）
☆52Updated 2 months ago
NVIDIA / k8s-driver-manager
The NVIDIA Driver Manager is a Kubernetes component which assist in seamless upgrades of NVIDIA Driver on each node of the cluster.
☆41Updated last week
k8snetworkplumbingwg / ib-sriov-cni
InfiniBand SR-IOV CNI
☆54Updated this week
Mellanox / ib-kubernetes
☆68Updated this week
Project-HAMi / volcano-vgpu-device-plugin
Device-plugin for volcano vgpu which support hard resource isolation
☆119Updated last month
CentaurusInfra / alnair
Intelligent platform for AI workloads
☆37Updated 2 years ago
kubernetes-sigs / kjob
KJob: Tool for CLI-loving ML researchers
☆39Updated this week
ai-dynamo / aiconfigurator
Offline optimization of your disaggregated Dynamo graph
☆88Updated this week
InftyAI / Manta
💫 A lightweight p2p-based cache system for model distributions on Kubernetes. Reframing now to make it an unified cache system with POSI…
☆24Updated 10 months ago
vmware-archive / bitfusion-with-kubernetes-integration
Bitfusion with Kubernetes Integration Support
☆50Updated 2 years ago
modelbox-ai / modelbox
A high performance, high expansion, easy to use framework for AI application. 为AI应用的开发者提供一套统一的高性能、易用的编程框架，快速基于AI全栈服务、开发跨端边云的AI行业应用，支持GPU，…
☆157Updated last year
NVIDIA / holodeck
Holodeck is a project to create test environments optimised for GPU projects.
☆19Updated this week
lenovo / openlico
☆20Updated last year
NVIDIA / cloud-native-docs
Documentation repository for NVIDIA Cloud Native Technologies
☆29Updated this week
a0s / nvidia-smi-exporter
Nvidia-smi Prometheus exporter with respecting of GPU-UUID
☆37Updated 2 years ago