TPU inference for vLLM, with unified JAX and PyTorch support.
☆266Mar 24, 2026Updated this week
Alternatives and similar repositories for tpu-inference
Users that are interested in tpu-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆18Mar 11, 2026Updated 2 weeks ago
- ☆33Feb 4, 2026Updated last month
- Minimal yet performant LLM examples in pure JAX☆245Jan 14, 2026Updated 2 months ago
- vLLM performance dashboard☆43Apr 26, 2024Updated last year
- ☆15May 11, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Automatic differentiation for Triton Kernels☆29Aug 12, 2025Updated 7 months ago
- This repo hosts code for vLLM CI & Performance Benchmark infrastructure.☆32Mar 18, 2026Updated last week
- JAX bindings for the flash-attention3 kernels☆21Jan 2, 2026Updated 2 months ago
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆123Updated this week
- Large Language Model Text Generation Inference on Habana Gaudi☆34Mar 20, 2025Updated last year
- JAX backend for SGL☆252Updated this week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆417Jan 5, 2026Updated 2 months ago
- Paper-reading notes for Berkeley OS prelim exam.☆14Aug 28, 2024Updated last year
- A simplified and automated orchestration workflow to perform ML end-to-end (E2E) model tests and benchmarking on Cloud VMs across differe…☆61Updated this week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- llm-d helm charts and deployment examples☆50Updated this week
- Tokamax: A GPU and TPU kernel library.☆185Mar 19, 2026Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆85Mar 18, 2026Updated last week
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆19Nov 18, 2024Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆94Sep 4, 2024Updated last year
- Google TPU optimizations for transformers models☆136Jan 23, 2026Updated 2 months ago
- A simple, performant and scalable Jax LLM!☆2,182Updated this week
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆220Updated this week
- Accelerate, Optimize performance with streamlined training and serving options with JAX.☆342Mar 20, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- EuroSys '24: "Trinity: A Fast Compressed Multi-attribute Data Store"☆19Mar 8, 2025Updated last year
- 🎹 Instruct.KR 2025 Summer Meetup: 오픈소스 LLM, vLLM으로 Production까지 🎹☆23Aug 2, 2025Updated 7 months ago
- Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)☆65Mar 11, 2026Updated 2 weeks ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- ☆23Aug 21, 2025Updated 7 months ago
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year
- ☆195Mar 10, 2026Updated 2 weeks ago
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆736Updated this week
- A Lightweight LLM Post-Training Library☆2,196Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- An efficient method for the conversion from internal to Cartesian coordinates that utilizes the platform-agnostic JAX Python library.☆20Jun 12, 2024Updated last year
- GenAI inference performance benchmarking tool☆156Mar 16, 2026Updated last week
- ☆79Mar 18, 2026Updated last week
- Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning☆60Dec 18, 2025Updated 3 months ago
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Jan 11, 2025Updated last year
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year