AI-Hypercomputer / tpu-recipesLinks
☆39Updated 3 weeks ago
Alternatives and similar repositories for tpu-recipes
Users that are interested in tpu-recipes are comparing it to the libraries listed below
Sorting:
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆136Updated this week
- A set of Python scripts that makes your experience on TPU better☆54Updated last year
- ☆142Updated last week
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆67Updated 4 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆188Updated 2 months ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆86Updated last year
- Google TPU optimizations for transformers models☆117Updated 6 months ago
- Accelerate, Optimize performance with streamlined training and serving options with JAX.☆293Updated this week
- Load compute kernels from the Hub☆220Updated last week
- seqax = sequence modeling + JAX☆165Updated 2 weeks ago
- ☆141Updated last week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆48Updated this week
- A JAX-native LLM Post-Training Library☆84Updated this week
- ☆323Updated last week
- 👷 Build compute kernels☆87Updated last week
- Experiment of using Tangent to autodiff triton☆80Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆19Updated 2 weeks ago
- train with kittens!☆62Updated 9 months ago
- Minimal but scalable implementation of large language models in JAX☆35Updated 2 weeks ago
- ☆114Updated last year
- ☆162Updated last year
- ☆187Updated last week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆365Updated last month
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆78Updated this week
- ☆83Updated last year
- Two implementations of ZeRO-1 optimizer sharding in JAX☆14Updated 2 years ago
- PyTorch Single Controller☆345Updated this week
- Custom triton kernels for training Karpathy's nanoGPT.☆19Updated 9 months ago
- ☆275Updated last year
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆377Updated this week