DiegoStock12 / kubeml
Simple Serverless Platform for training Neural Networks in a distributed manner on Kubernetes
☆22Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for kubeml
- Intent Driven Orchestration enables management of applications through their Service Level Objectives, while minimizing developer and adm…☆34Updated 2 months ago
- GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)☆31Updated 11 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆101Updated last week
- Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020☆125Updated 3 months ago
- Holistic job manager on Kubernetes☆108Updated 8 months ago
- A tool to detect infrastructure issues on cloud native AI systems☆16Updated last week
- ☆49Updated last year
- Main repository of the BeFaaS project☆14Updated last year
- An interference-aware scheduler for fine-grained GPU sharing☆108Updated 5 months ago
- A benchmark suite for evaluating FaaS scheduler.☆22Updated 2 years ago
- Integrated Training Platform (ITP) traces used in ElasticFlow paper.☆27Updated last year
- A curated list of awesome serverless research works, including papers and open-sourced projects.☆76Updated last year
- FaaSFlow: Enable Efficient Workflow Execution for Function-as-a-Service☆72Updated 7 months ago
- Serverless for all computation☆41Updated last year
- ☆36Updated 4 months ago
- The source code of INFless,a native serverless platform for AI inference.☆34Updated 2 years ago
- rFaaS: a high-performance FaaS platform with RDMA acceleration for low-latency invocations.☆49Updated last month
- Intelligent platform for AI workloads☆37Updated last year
- FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute (USENIX ATC'21)☆53Updated 2 years ago
- Model Server for Kepler☆25Updated this week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆50Updated this week
- ML Input Data Processing as a Service. This repository contains the source code for Cachew (built on top of TensorFlow).☆36Updated 2 months ago
- ☆20Updated this week
- GPU-scheduler-for-deep-learning☆198Updated 4 years ago
- Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…☆45Updated last month
- Nightcore: Efficient and Scalable Serverless Computing for Latency-Sensitive, Interactive Microservices [ASPLOS '21]☆98Updated 3 years ago
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆69Updated this week
- Lucid: A Non-Intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs☆49Updated last year
- A suite of representative serverless cloud-agnostic (i.e., dockerized) benchmarks☆48Updated last week
- Automatic tuning for ML model deployment on Kubernetes☆80Updated last week