AI-Hypercomputer / JetStreamLinks

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

☆392

Alternatives and similar repositories for JetStream

Users that are interested in JetStream are comparing it to the libraries listed below

Sorting:

AI-Hypercomputer / jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
☆78Updated 2 months ago
google / saxml
☆147Updated 3 weeks ago
AI-Hypercomputer / xpk
xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…
☆153Updated last week
google / paxml
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…
☆540Updated 2 weeks ago
meta-pytorch / torchft
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆454Updated 3 weeks ago
vllm-project / tpu-inference
TPU inference for vLLM, with unified JAX and PyTorch support.
☆170Updated this week
coreweave / tensorizer
Module, Model, and Tensor Serialization/Deserialization
☆276Updated 3 months ago
google / aqt
☆337Updated last week
rwitten / HighPerfLLMs2024
☆546Updated last year
NVIDIA / nvidia-resiliency-ext
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …
☆239Updated this week
apple / ml-recurrent-drafter
☆219Updated 10 months ago
meta-pytorch / torchx
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…
☆402Updated last week
AI-Hypercomputer / gpu-recipes
Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.
☆101Updated last week
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆401Updated last week
AI-Hypercomputer / maxdiffusion
☆283Updated last week
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆327Updated this week
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆123Updated 10 months ago
huggingface / optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
☆201Updated last week
run-ai / runai-model-streamer
☆267Updated last week
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆217Updated last week
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆307Updated 3 months ago
NVIDIA-NeMo / Run
A tool to configure, launch and manage your machine learning experiments.
☆208Updated this week
perplexityai / pplx-kernels
Perplexity GPU Kernels
☆534Updated 3 weeks ago
imbue-ai / cluster-health
☆316Updated last year
facebookresearch / HolisticTraceAnalysis
A library to analyze PyTorch traces.
☆437Updated 2 weeks ago
meta-pytorch / torchcomms
torchcomms: a modern PyTorch communications API
☆295Updated last week
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated last year
perplexityai / pplx-garden
Perplexity open source garden for inference technology
☆274Updated 2 weeks ago
perplexityai / libfabric-efa-demo
☆72Updated 9 months ago
AI-Hypercomputer / tpu-recipes
☆54Updated this week