AI-Hypercomputer / jetstream-pytorchLinks

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

☆66

Alternatives and similar repositories for jetstream-pytorch

Users that are interested in jetstream-pytorch are comparing it to the libraries listed below

Sorting:

AI-Hypercomputer / JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…
☆364Updated last month
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆206Updated this week
google / aqt
☆323Updated last month
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆117Updated 6 months ago
huggingface / kernels
Load compute kernels from the Hub
☆220Updated this week
apple / ml-recurrent-drafter
☆215Updated 6 months ago
pytorch-labs / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆338Updated last week
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆258Updated last week
google / saxml
☆142Updated 2 weeks ago
cchan / tccl
extensible collectives library in triton
☆88Updated 4 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆127Updated 8 months ago
pytorch / torchft
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆372Updated this week
gpu-mode / ring-attention
ring-attention experiments
☆146Updated 9 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆227Updated this week
pytorch-labs / applied-ai
Applied AI experiments and examples for PyTorch
☆289Updated 2 months ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆80Updated 11 months ago
snowflakedb / ArcticTraining
ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)
☆190Updated this week
IST-DASLab / Quartet
☆76Updated last month
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆265Updated 9 months ago
mgmalek / efficient_cross_entropy
☆114Updated last year
google / praxis
☆187Updated this week
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆47Updated this week
open-lm-engine / flash-model-architectures
A bunch of kernels that might make stuff slower 😉
☆56Updated this week
lianakoleva / no-libtorch-compile
☆21Updated 5 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆149Updated last month
pytorch-labs / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆193Updated this week
stanford-futuredata / stk
☆107Updated 11 months ago
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆141Updated this week
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆230Updated 8 months ago