AI-Hypercomputer / xpkLinks

xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.

☆129

Alternatives and similar repositories for xpk

Users that are interested in xpk are comparing it to the libraries listed below

Sorting:

google / saxml
☆142Updated this week
AI-Hypercomputer / JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…
☆354Updated last month
AI-Hypercomputer / gpu-recipes
Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.
☆74Updated last week
AI-Hypercomputer / tpu-recipes
☆36Updated 3 weeks ago
google / paxml
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…
☆513Updated last week
GoogleCloudPlatform / slurm-gcp
☆50Updated last week
AI-Hypercomputer / maxdiffusion
☆230Updated this week
google / praxis
☆186Updated last month
GoogleCloudPlatform / cluster-toolkit
Cluster Toolkit is an open-source software offered by Google Cloud which makes it easy for customers to deploy AI/ML and HPC environments…
☆272Updated this week
GoogleCloudPlatform / ml-testing-accelerators
Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)
☆64Updated 3 weeks ago
AI-Hypercomputer / jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
☆64Updated 3 months ago
AI-Hypercomputer / cloud-accelerator-diagnostics
☆22Updated this week
coreweave / tensorizer
Module, Model, and Tensor Serialization/Deserialization
☆248Updated this week
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆114Updated 5 months ago
GoogleCloudPlatform / ml-auto-solutions
A simplified and automated orchestration workflow to perform ML end-to-end (E2E) model tests and benchmarking on Cloud VMs across differe…
☆49Updated 2 weeks ago
google / orbax
Orbax provides common checkpointing and persistence utilities for JAX users
☆404Updated this week
NVIDIA / JAX-Toolbox
JAX-Toolbox
☆321Updated this week
run-ai / runai-model-streamer
☆228Updated this week
gclouduniverse / reproducibility-deprecated
☆16Updated 4 months ago
pytorch-labs / monarch
PyTorch Single Controller
☆318Updated this week
stanford-crfm / levanter
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
☆607Updated this week
rwitten / HighPerfLLMs2024
☆511Updated last year
GoogleCloudPlatform / ai-on-gke
AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kub…
☆322Updated 3 weeks ago
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆219Updated last year
google / aqt
☆320Updated 2 weeks ago
pytorch / test-infra
This repository hosts code that supports the testing infrastructure for the PyTorch organization. For example, this repo hosts the logic …
☆96Updated this week
pytorch / torchft
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆359Updated 2 weeks ago
MatX-inc / seqax
seqax = sequence modeling + JAX
☆165Updated last month
google-deepmind / nanodo
☆273Updated last year
NVIDIA-NeMo / Run
A tool to configure, launch and manage your machine learning experiments.
☆171Updated this week