Resource scheduling and cluster management for AI
☆2,687Jun 6, 2024Updated last year
Alternatives and similar repositories for pai
Users that are interested in pai are comparing it to the libraries listed below
Sorting:
- An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model c…☆14,342Jul 3, 2024Updated last year
- Kubernetes Scheduler for Deep Learning☆264May 22, 2022Updated 3 years ago
- General-Purpose Kubernetes Pod Controller☆173Apr 4, 2023Updated 2 years ago
- Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.☆14,675Dec 1, 2025Updated 2 months ago
- A high performance and generic framework for distributed DNN training☆3,716Oct 3, 2023Updated 2 years ago
- Machine Learning Toolkit for Kubernetes☆15,462Jan 5, 2026Updated last month
- MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Co…☆5,817Aug 7, 2025Updated 6 months ago
- Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, …☆3,214Mar 20, 2025Updated 11 months ago
- MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle☆3,697Feb 21, 2026Updated last week
- Open Machine Learning Compiler Framework☆13,142Updated this week
- GPU Sharing Scheduler for Kubernetes Cluster☆1,528Dec 29, 2023Updated 2 years ago
- A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep lear…☆5,634Feb 19, 2026Updated last week
- Automated Machine Learning on Kubernetes☆1,656Feb 18, 2026Updated last week
- Open Source ML Model Versioning, Metadata, and Experiment Management☆1,744Jul 23, 2024Updated last year
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.☆41,413Feb 21, 2026Updated last week
- The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, …☆24,365Updated this week
- The Triton Inference Server provides an optimized cloud and edge inferencing solution.☆10,375Feb 21, 2026Updated last week
- Kubernetes-native Deep Learning Framework☆746Jan 26, 2024Updated 2 years ago
- A Cloud Native Batch System (Project under CNCF)☆5,340Updated this week
- A CLI for Kubeflow.☆809Feb 11, 2026Updated 2 weeks ago
- Fabric for Deep Learning (FfDL, pronounced fiddle) is a Deep Learning Platform offering TensorFlow, Caffe, PyTorch etc. as a Service on K…☆692Jan 29, 2026Updated last month
- An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models☆4,730Feb 16, 2026Updated last week
- A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch☆8,926Feb 16, 2026Updated last week
- Distributed AI Model Training and LLM Fine-Tuning on Kubernetes☆2,035Updated this week
- NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego☆1,455Jul 22, 2023Updated 2 years ago
- Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Juli…☆20,829Oct 25, 2023Updated 2 years ago
- ☆892Apr 2, 2024Updated last year
- A Flexible and Powerful Parameter Server for large-scale machine learning☆6,788Oct 13, 2025Updated 4 months ago
- Open standard for machine learning interoperability☆20,373Updated this week
- A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC☆1,094May 22, 2023Updated 2 years ago
- A flexible, high-performance serving system for machine learning models☆6,350Dec 18, 2025Updated 2 months ago
- ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling …☆6,532Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆41,648Updated this week
- Build and run Docker containers leveraging NVIDIA GPUs☆17,498Dec 6, 2023Updated 2 years ago
- Visualizer for neural network, deep learning and machine learning models☆32,465Updated this week
- A low-latency prediction-serving system☆1,424Apr 26, 2021Updated 4 years ago
- Resource-adaptive cluster scheduler for deep learning training.☆454Mar 5, 2023Updated 2 years ago
- GPU Sharing Device Plugin for Kubernetes Cluster☆492Jan 10, 2023Updated 3 years ago
- Run your deep learning workloads on Kubernetes more easily and efficiently.☆531Mar 4, 2024Updated last year