google / pathways-jobLinks
PathwaysJob API is an OSS Kubernetes-native API, to deploy ML training and batch inference workloads, using Pathways on GKE.
β12Updated last month
Alternatives and similar repositories for pathways-job
Users that are interested in pathways-job are comparing it to the libraries listed below
Sorting:
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"β71Updated 5 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β211Updated this week
- DTensor-native pretraining and fine-tuning for LLMs/VLMs with day-0 Hugging Face support, GPU-accelerated, and memory efficient.β71Updated last week
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β161Updated 2 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the β¦β221Updated this week
- ring-attention experimentsβ152Updated 11 months ago
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM trainingβ53Updated last month
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β265Updated last month
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β401Updated 2 weeks ago
- β45Updated 3 weeks ago
- A bunch of kernels that might make stuff slower πβ59Updated this week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welβ¦β375Updated 3 months ago
- β18Updated 2 weeks ago
- Applied AI experiments and examples for PyTorchβ295Updated 3 weeks ago
- β146Updated last month
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- β22Updated this week
- Triton-based Symmetric Memory operators and examplesβ28Updated 3 weeks ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.β299Updated this week
- β168Updated last year
- Two implementations of ZeRO-1 optimizer sharding in JAXβ14Updated 2 years ago
- Megatron's multi-modal data loaderβ243Updated last week
- β234Updated this week
- β188Updated 2 weeks ago
- A JAX-native LLM Post-Training Libraryβ143Updated this week
- extensible collectives library in tritonβ87Updated 5 months ago
- A library to analyze PyTorch traces.β409Updated last week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β274Updated last month
- β21Updated this week
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large β¦β65Updated 3 years ago