google / pathways-jobLinks
PathwaysJob API is an OSS Kubernetes-native API, to deploy ML training and batch inference workloads, using Pathways on GKE.
☆17Updated 3 months ago
Alternatives and similar repositories for pathways-job
Users that are interested in pathways-job are comparing it to the libraries listed below
Sorting:
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆262Updated this week
- Perplexity open source garden for inference technology☆359Updated last month
- extensible collectives library in triton☆95Updated 10 months ago
- LoRAFusion: Efficient LoRA Fine-Tuning for LLMs☆23Updated 4 months ago
- High-performance safetensors model loader☆99Updated 3 weeks ago
- ☆77Updated last year
- ☆15Updated 3 months ago
- JAX backend for SGL☆234Updated this week
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆64Updated 2 weeks ago
- torchcomms: a modern PyTorch communications API☆327Updated this week
- ☆71Updated 11 months ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆263Updated this week
- ☆104Updated last year
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆219Updated last week
- TORCH_TRACE parser for PT2☆75Updated last week
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆64Updated last year
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆45Updated this week
- ☆47Updated last year
- Offline optimization of your disaggregated Dynamo graph☆177Updated last week
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆462Updated last month
- Toolchain built around the Megatron-LM for Distributed Training☆84Updated 2 months ago
- Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport☆73Updated 9 months ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- DeeperGEMM: crazy optimized version☆73Updated 9 months ago
- ring-attention experiments☆165Updated last year
- fmchisel: Efficient Compression and Training Algorithms for Foundation Models☆83Updated 3 months ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆87Updated last week
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆164Updated 3 weeks ago
- TPU inference for vLLM, with unified JAX and PyTorch support.☆228Updated this week
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆123Updated last month