TPU inference for vLLM, with unified JAX and PyTorch support.
☆287Apr 14, 2026Updated this week
Alternatives and similar repositories for tpu-inference
Users that are interested in tpu-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆17Feb 18, 2026Updated last month
- ☆20Mar 11, 2026Updated last month
- ☆35Feb 4, 2026Updated 2 months ago
- Minimal yet performant LLM examples in pure JAX☆246Updated this week
- vLLM performance dashboard☆44Apr 26, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Automatic differentiation for Triton Kernels☆29Aug 12, 2025Updated 8 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆27Apr 9, 2026Updated last week
- ☆329Updated this week
- ☆16May 11, 2025Updated 11 months ago
- This repo hosts code for vLLM CI & Performance Benchmark infrastructure.☆34Updated this week
- JAX bindings for the flash-attention3 kernels☆22Jan 2, 2026Updated 3 months ago
- ☆137Updated this week
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆130Updated this week
- Large Language Model Text Generation Inference on Habana Gaudi☆34Mar 20, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆424Jan 5, 2026Updated 3 months ago
- JAX backend for SGL☆264Updated this week
- A simplified and automated orchestration workflow to perform ML end-to-end (E2E) model tests and benchmarking on Cloud VMs across differe…☆61Updated this week
- llm-d helm charts and deployment examples☆54Apr 2, 2026Updated 2 weeks ago
- Minimal, predictable, footgun-free config library.☆41Apr 1, 2026Updated 2 weeks ago
- A simple, performant and scalable Jax LLM!☆2,230Updated this week
- Tokamax: A GPU and TPU kernel library.☆198Updated this week
- A practical way of learning Swizzle☆37Feb 3, 2025Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆85Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆19Nov 18, 2024Updated last year
- CUDA Embedding Lookup Kernel Library☆43Feb 9, 2026Updated 2 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆95Sep 4, 2024Updated last year
- Google TPU optimizations for transformers models☆136Jan 23, 2026Updated 2 months ago
- Accelerate, Optimize performance with streamlined training and serving options with JAX.☆353Apr 9, 2026Updated last week
- ☆13Updated this week
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆222Updated this week
- EuroSys '24: "Trinity: A Fast Compressed Multi-attribute Data Store"☆18Mar 8, 2025Updated last year
- Simple demo showing how to use the Forge API by Nous Research☆16Nov 12, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 🎹 Instruct.KR 2025 Summer Meetup: 오픈소스 LLM, vLLM으로 Production까지 🎹☆23Aug 2, 2025Updated 8 months ago
- Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)☆65Mar 11, 2026Updated last month
- ☆23Aug 21, 2025Updated 7 months ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year
- Jetson Nano control and vision with ROS2 RealSense2, RPlidar, BNO055, Python, Pygame and ModBus☆12Feb 13, 2021Updated 5 years ago
- ☆196Mar 27, 2026Updated 2 weeks ago