leptonai / gpud
☆214Updated this week
Related projects ⓘ
Alternatives and complementary repositories for gpud
- NVIDIA NCCL Tests for Distributed Training☆70Updated 2 weeks ago
- A kubernetes plugin which enables dynamically add or remove GPU resources for a running Pod☆120Updated 2 years ago
- Device-plugin for volcano vgpu which support hard resource isolation☆48Updated 2 weeks ago
- Kubernetes Operator for AI and Bigdata Elastic Training☆84Updated 3 months ago
- HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container☆105Updated last month
- ☆24Updated last month
- ☆198Updated 3 weeks ago
- ☆273Updated 3 months ago
- CUDA checkpoint and restore utility☆226Updated 7 months ago
- Efficient and easy multi-instance LLM serving☆216Updated this week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆103Updated last week
- Go Abstraction for Allocating NVIDIA GPUs with Custom Policies☆108Updated 4 months ago
- A library developed by Volcano Engine for high-performance reading and writing of PyTorch model files.☆13Updated 5 months ago
- ☆55Updated 4 years ago
- Device plugins for Volcano, e.g. GPU☆105Updated 2 months ago
- Using CRDs to manage GPU resources in Kubernetes.☆191Updated 2 years ago
- Making Long-Context LLM Inference 10x Faster and 10x Cheaper☆236Updated this week
- elastic-gpu-scheduler is a Kubernetes scheduler extender for GPU resources scheduling.☆135Updated 2 years ago
- Kubernetes Rdma SRIOV device plugin☆109Updated 3 years ago
- AI 基础知识 - GPU 架构、CUDA 编程以及大模型基础知识☆58Updated last month
- RDMA device plugin for Kubernetes☆204Updated 11 months ago
- Elastic Deep Learning Training based on Kubernetes by Leveraging EDL and Volcano☆31Updated last year
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆112Updated last year
- A simulator of Kuberntes for batch and service workload.☆45Updated 3 years ago
- ☆31Updated 3 years ago
- Automatic tuning for ML model deployment on Kubernetes☆80Updated 2 weeks ago
- Intelligent platform for AI workloads☆37Updated last year
- ☆129Updated 3 years ago
- GLake: optimizing GPU memory management and IO transmission.☆379Updated 3 months ago
- Large language model fine-tuning capabilities based on cloud native and distributed computing.☆91Updated 8 months ago