AI-Hypercomputer / tpu-recipesLinks
☆45Updated last month
Alternatives and similar repositories for tpu-recipes
Users that are interested in tpu-recipes are comparing it to the libraries listed below
Sorting:
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆73Updated 3 weeks ago
- ☆146Updated last week
- Google TPU optimizations for transformers models☆120Updated 8 months ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆379Updated 3 months ago
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆143Updated this week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆193Updated 4 months ago
- 👷 Build compute kernels☆149Updated this week
- A set of Python scripts that makes your experience on TPU better☆54Updated 2 weeks ago
- Load compute kernels from the Hub☆290Updated last week
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 6 months ago
- Experiment of using Tangent to autodiff triton☆81Updated last year
- train with kittens!☆62Updated 11 months ago
- ☆122Updated last year
- ☆331Updated 3 weeks ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆57Updated last week
- ☆188Updated last week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆414Updated this week
- a Jax quantization library☆46Updated this week
- Various transformers for FSDP research☆38Updated 2 years ago
- ☆21Updated 7 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆213Updated this week
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆87Updated last year
- DTensor-native pretraining and fine-tuning for LLMs/VLMs with day-0 Hugging Face support, GPU-acceleration, and memory efficiency.☆84Updated this week
- Minimal yet performant LLM examples in pure JAX☆177Updated last week
- ☆89Updated last year
- Explore training for quantized models☆24Updated 2 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆143Updated last year
- PyTorch centric eager mode debugger☆48Updated 9 months ago
- LM engine is a library for pretraining/finetuning LLMs☆67Updated last week
- Fast, Modern, and Low Precision PyTorch Optimizers☆112Updated last month