AI-Hypercomputer / tpu-recipesLinks

☆54

Alternatives and similar repositories for tpu-recipes

Users that are interested in tpu-recipes are comparing it to the libraries listed below

Sorting:

huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆123Updated 10 months ago
AI-Hypercomputer / jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
☆78Updated 2 months ago
google / saxml
☆148Updated 3 weeks ago
AI-Hypercomputer / xpk
xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…
☆154Updated this week
huggingface / kernels
Load compute kernels from the Hub
☆337Updated last week
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated 2 months ago
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆196Updated 6 months ago
AI-Hypercomputer / JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…
☆392Updated 5 months ago
huggingface / kernel-builder
👷 Build compute kernels
☆190Updated this week
lianakoleva / no-libtorch-compile
☆21Updated 9 months ago
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆87Updated last year
warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆116Updated 3 months ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆61Updated this week
google / praxis
☆190Updated 2 weeks ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
AI-Hypercomputer / gpu-recipes
Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.
☆101Updated last week
lessw2020 / transformer_central
Various transformers for FSDP research
☆38Updated 3 years ago
HazyResearch / train-tk
train with kittens!
☆63Updated last year
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆313Updated last month
google-deepmind / asyncdiloco
☆47Updated last year
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 8 months ago
google / qwix
a Jax quantization library
☆68Updated last week
vllm-project / tpu-inference
TPU inference for vLLM, with unified JAX and PyTorch support.
☆170Updated last week
meta-pytorch / torchft
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆454Updated 3 weeks ago
google / aqt
☆337Updated 2 weeks ago
NVIDIA-NeMo / Automodel
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
☆194Updated this week
MatX-inc / seqax
seqax = sequence modeling + JAX
☆168Updated 4 months ago
cloneofsimo / min-fsdp
☆91Updated last year
erfanzar / EasyDeL
Accelerate, Optimize performance with streamlined training and serving options with JAX.
☆325Updated this week
Sea-Snell / JAX_llama
Inference code for LLaMA models in JAX
☆120Updated last year