volcengine / veTurboIO
A library developed by Volcano Engine for high-performance reading and writing of PyTorch model files.
☆13Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for veTurboIO
- Automatic tuning for ML model deployment on Kubernetes☆80Updated 3 weeks ago
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆112Updated last year
- Fine-grained GPU sharing primitives☆140Updated 4 years ago
- ☆55Updated 4 years ago
- GPU-scheduler-for-deep-learning☆200Updated 4 years ago
- NVIDIA NCCL Tests for Distributed Training☆70Updated 2 weeks ago
- ☆33Updated 2 months ago
- Efficient and easy multi-instance LLM serving☆216Updated this week
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆57Updated 5 months ago
- RDMA and SHARP plugins for nccl library☆162Updated last week
- Kubernetes Operator for AI and Bigdata Elastic Training☆84Updated 3 months ago
- A kubernetes plugin which enables dynamically add or remove GPU resources for a running Pod☆120Updated 2 years ago
- An interference-aware scheduler for fine-grained GPU sharing☆111Updated 6 months ago
- ☆214Updated this week
- Kubernetes Rdma SRIOV device plugin☆109Updated 3 years ago
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆124Updated 2 years ago
- GLake: optimizing GPU memory management and IO transmission.☆379Updated 3 months ago
- Forked form☆10Updated 3 years ago
- Artifacts for our NSDI'23 paper TGS☆68Updated 5 months ago
- NCCL Profiling Kit☆112Updated 4 months ago
- PyTorch distributed training acceleration framework☆34Updated this week
- ☆82Updated 2 years ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆57Updated 6 months ago
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆48Updated 2 years ago
- Intelligent platform for AI workloads☆37Updated last year
- Model-less Inference Serving☆82Updated last year
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆101Updated 8 months ago
- Stateful LLM Serving☆38Updated 3 months ago
- Fault-tolerant for DL frameworks☆69Updated last year
- Common APIs and libraries shared by other Kubeflow operator repositories.☆51Updated last year