AI-Hypercomputer / tpu-recipesLinks
☆53Updated this week
Alternatives and similar repositories for tpu-recipes
Users that are interested in tpu-recipes are comparing it to the libraries listed below
Sorting:
- Google TPU optimizations for transformers models☆122Updated 9 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆77Updated 2 months ago
- Load compute kernels from the Hub☆326Updated this week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆195Updated 5 months ago
- A set of Python scripts that makes your experience on TPU better☆54Updated last month
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆151Updated this week
- ☆145Updated last week
- ☆190Updated 3 weeks ago
- ☆337Updated last week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆388Updated 5 months ago
- 👷 Build compute kernels☆171Updated this week
- Package of Pathways-on-Cloud utilities☆20Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆61Updated this week
- Experiment of using Tangent to autodiff triton☆79Updated last year
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆87Updated last year
- Two implementations of ZeRO-1 optimizer sharding in JAX☆14Updated 2 years ago
- ☆15Updated 5 months ago
- ☆21Updated 8 months ago
- Minimal yet performant LLM examples in pure JAX☆198Updated last month
- seqax = sequence modeling + JAX☆168Updated 3 months ago
- Various transformers for FSDP research☆38Updated 3 years ago
- torchax is a PyTorch frontend for JAX. It gives JAX the ability to author JAX programs using familiar PyTorch syntax. It also provides JA…☆122Updated this week
- Accelerate, Optimize performance with streamlined training and serving options with JAX.☆321Updated this week
- A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.☆104Updated last month
- Where GPUs get cooked 👩🍳🔥☆310Updated last month
- ☆91Updated last year
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆96Updated last week
- Fast, Modern, and Low Precision PyTorch Optimizers☆116Updated 2 months ago
- An implementation of the Llama architecture, to instruct and delight☆21Updated 5 months ago
- LM engine is a library for pretraining/finetuning LLMs☆74Updated last week