AI-Hypercomputer / cloud-accelerator-diagnosticsLinks
☆20Updated 2 weeks ago
Alternatives and similar repositories for cloud-accelerator-diagnostics
Users that are interested in cloud-accelerator-diagnostics are comparing it to the libraries listed below
Sorting:
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆123Updated this week
- ☆186Updated this week
- ☆138Updated 2 weeks ago
- Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…☆499Updated 2 weeks ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆60Updated 2 months ago
- A simple library for scaling up JAX programs☆137Updated 7 months ago
- ☆310Updated 2 weeks ago
- ☆116Updated 2 weeks ago
- jax-triton contains integrations between JAX and OpenAI Triton☆395Updated this week
- JAX Synergistic Memory Inspector☆173Updated 10 months ago
- Orbax provides common checkpointing and persistence utilities for JAX users☆384Updated this week
- JAX-Toolbox☆308Updated this week
- JAX bindings for Flash Attention v2☆88Updated 10 months ago
- A set of Python scripts that makes your experience on TPU better☆54Updated 11 months ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆335Updated this week
- ☆108Updated last year
- ☆487Updated 10 months ago
- seqax = sequence modeling + JAX☆155Updated last month
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆380Updated last month
- ☆267Updated 10 months ago
- PyTorch per step fault tolerance (actively under development)☆302Updated this week
- Implementation of Flash Attention in Jax☆212Updated last year
- JAX implementation of the Llama 2 model☆217Updated last year
- Load compute kernels from the Hub☆139Updated this week
- Library for reading and processing ML training data.☆447Updated this week
- ☆14Updated 10 months ago
- A JAX-native LLM Post-Training Library☆32Updated this week
- ☆67Updated 2 years ago
- Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.☆24Updated 8 months ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆586Updated last week