TPU inference for vLLM, with unified JAX and PyTorch support.
☆307May 4, 2026Updated this week
Alternatives and similar repositories for tpu-inference
Users that are interested in tpu-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆18Feb 18, 2026Updated 2 months ago
- ☆20Mar 11, 2026Updated last month
- ☆35Apr 27, 2026Updated last week
- LinearKAN: A very fast implementation of Kolmogorov-Arnold Networks☆19Nov 12, 2025Updated 5 months ago
- Automatic differentiation for Triton Kernels☆29Aug 12, 2025Updated 8 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A high-throughput and memory-efficient inference and serving engine for LLMs☆28Updated this week
- ☆342Apr 30, 2026Updated last week
- ☆16May 11, 2025Updated 11 months ago
- JAX bindings for the flash-attention3 kernels☆22Jan 2, 2026Updated 4 months ago
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆131Updated this week
- Large Language Model Text Generation Inference on Habana Gaudi☆34Mar 20, 2025Updated last year
- JAX backend for SGL☆268Updated this week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆432Jan 5, 2026Updated 4 months ago
- llm-d helm charts and deployment examples☆55Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This repository contains the results and code for the MLPerf™ Inference v4.0 benchmark.☆11Jul 24, 2025Updated 9 months ago
- A simple, performant and scalable Jax LLM!☆2,265Updated this week
- ☆10Feb 23, 2025Updated last year
- Tokamax: A GPU and TPU kernel library.☆208Updated this week
- A practical way of learning Swizzle☆38Feb 3, 2025Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆87Apr 28, 2026Updated last week
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆19Nov 18, 2024Updated last year
- CUDA Embedding Lookup Kernel Library☆43Feb 9, 2026Updated 2 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆96Sep 4, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Google TPU optimizations for transformers models☆138Jan 23, 2026Updated 3 months ago
- ☆13Updated this week
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆222Apr 29, 2026Updated last week
- Accelerate, Optimize performance with streamlined training and serving options with JAX.☆359Apr 27, 2026Updated last week
- Simple demo showing how to use the Forge API by Nous Research☆17Nov 12, 2024Updated last year
- 🎹 Instruct.KR 2025 Summer Meetup: 오픈소스 LLM, vLLM으로 Production까지 🎹☆23Aug 2, 2025Updated 9 months ago
- Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)☆65Mar 11, 2026Updated last month
- ☆23Aug 21, 2025Updated 8 months ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆29Dec 17, 2024Updated last year
- ☆196Updated this week
- A modern web interface for Ollama, built with Nuxt 3 and Vue. Features a clean UI with dark/light modes, model management (copy/rename/de…☆19Feb 22, 2026Updated 2 months ago
- A Lightweight LLM Post-Training Library☆2,249Updated this week
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆816Apr 2, 2026Updated last month
- GenAI inference performance benchmarking tool☆180Updated this week
- Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning☆63Dec 18, 2025Updated 4 months ago