vllm-project / tpu-inferenceLinks

TPU inference for vLLM, with unified JAX and PyTorch support.

☆170

Alternatives and similar repositories for tpu-inference

Users that are interested in tpu-inference are comparing it to the libraries listed below

Sorting:

meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆307Updated 3 months ago
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆401Updated last week
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆217Updated last week
gpu-mode / ring-attention
ring-attention experiments
☆160Updated last year
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆271Updated last week
Deep-Learning-Profiling-Tools / triton-viz
☆256Updated last week
meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆294Updated this week
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆253Updated 2 months ago
AI-Hypercomputer / jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
☆78Updated 2 months ago
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆131Updated 6 months ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆169Updated 7 months ago
meta-pytorch / BackendBench
Ship correct and fast LLM kernels to PyTorch
☆124Updated 2 weeks ago
sgl-project / sglang-jax
JAX backend for SGL
☆185Updated this week
cchan / tccl
extensible collectives library in triton
☆91Updated 8 months ago
meta-pytorch / torchcomms
torchcomms: a modern PyTorch communications API
☆295Updated this week
triton-lang / kernels
☆94Updated last year
IST-DASLab / qutlass
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
☆143Updated 3 weeks ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆86Updated last year
huggingface / kernels
Load compute kernels from the Hub
☆337Updated last week
HazyResearch / Megakernels
kernels, of the mega variety
☆614Updated 2 months ago
yanring / Megatron-MoE-ModelZoo
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
☆130Updated 3 weeks ago
gpu-mode / triton-index
Cataloging released Triton kernels.
☆274Updated 2 months ago
Dao-AILab / quack
A Quirky Assortment of CuTe Kernels
☆675Updated last week
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆226Updated last year
stanford-futuredata / stk
☆113Updated last year
perplexityai / pplx-kernels
Perplexity GPU Kernels
☆534Updated 3 weeks ago
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆234Updated this week
meta-pytorch / kraken
Triton-based Symmetric Memory operators and examples
☆63Updated last month
apple / ml-recurrent-drafter
☆219Updated 10 months ago
yifuwang / symm-mem-recipes
☆148Updated 11 months ago