NascentCore / 3kLinks
Orchestrating many small GPU clusters for running serverless GPU workloads
☆17Updated 9 months ago
Alternatives and similar repositories for 3k
Users that are interested in 3k are comparing it to the libraries listed below
Sorting:
- InfiniBand SR-IOV CNI☆13Updated last month
- This repository provides installation scripts and configuration files for deploying the CSGHub instance, includes Helm charts and Docker…☆18Updated this week
- A collection of useful Go libraries to ease the development of NVIDIA Operators for GPU/NIC management.☆28Updated this week
- 国产加速卡-海光DCU实战(大模型训练、微调、推理 等)☆67Updated 5 months ago
- The NVIDIA Driver Manager is a Kubernetes component which assist in seamless upgrades of NVIDIA Driver on each node of the cluster.☆48Updated last week
- Terraform provider for BaiduCloud☆24Updated 2 weeks ago
- Prometheus exporter for a Infiniband Fabric☆69Updated 2 years ago
- ☆26Updated this week
- Intelligent platform for AI workloads☆37Updated 3 years ago
- llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deploy…☆91Updated last year
- ☆14Updated 6 months ago
- NVIDIA NCCL Tests for Distributed Training☆134Updated last week
- RDMA CNI plugin for containerized workloads☆58Updated 3 weeks ago
- Bitfusion with Kubernetes Integration Support☆50Updated 2 years ago
- Golang bindings for Nvidia Datacenter GPU Manager (DCGM)☆147Updated this week
- FlagCX is a scalable and adaptive cross-chip communication library.☆172Updated this week
- ☆71Updated this week
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆30Updated 10 months ago
- HPC Cluster Web Portal☆18Updated last year
- Testing if I can implement slurm in an operator☆15Updated last year
- NVIDIA Networking NIC Configuration Operator For Kubernetes☆14Updated this week
- Documentation repository for NVIDIA Cloud Native Technologies☆35Updated this week
- m3fs(Make 3FS) is the toolset designed to deploy 3FS cluster.☆58Updated 3 weeks ago
- A high performance, high expansion, easy to use framework for AI application. 为AI应用的开发者提供一套统一的高性能、易用的编程框架,快速基于AI全栈服务、开发跨端边云的AI行业应用,支持GPU,…☆160Updated last year
- A distributed scheduling system for HPC and AI workloads☆132Updated last week
- ☆109Updated this week
- Fast and efficient attention method exploration and implementation.☆25Updated 10 months ago
- A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node☆63Updated last month
- ☆24Updated last week
- Resource Topology exporter for Topology Aware Scheduler☆16Updated last week