TPU inference for vLLM, with unified JAX and PyTorch support.
☆349Jun 10, 2026Updated this week
Alternatives and similar repositories for tpu-inference
Users that are interested in tpu-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆18Feb 18, 2026Updated 3 months ago
- ☆21Mar 11, 2026Updated 3 months ago
- ☆36Jun 6, 2026Updated last week
- LinearKAN: A very fast implementation of Kolmogorov-Arnold Networks☆19Nov 12, 2025Updated 7 months ago
- Automatic differentiation for Triton Kernels☆29Aug 12, 2025Updated 10 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- A high-throughput and memory-efficient inference and serving engine for LLMs☆30Updated this week
- ☆358Updated this week
- This repo hosts code for vLLM CI & Performance Benchmark infrastructure.☆43Updated this week
- ☆15May 11, 2025Updated last year
- JAX bindings for the flash-attention3 kernels☆23Jan 2, 2026Updated 5 months ago
- ☆139Jun 8, 2026Updated last week
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆133Jun 8, 2026Updated last week
- Large Language Model Text Generation Inference on Habana Gaudi☆34Mar 20, 2025Updated last year
- JAX backend for SGL☆280Jun 9, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆445Jan 5, 2026Updated 5 months ago
- Paper-reading notes for Berkeley OS prelim exam.☆14Aug 28, 2024Updated last year
- A simplified and automated orchestration workflow to perform ML end-to-end (E2E) model tests and benchmarking on Cloud VMs across differe…☆64Updated this week
- This repository contains the results and code for the MLPerf™ Inference v4.0 benchmark.☆11Jul 24, 2025Updated 10 months ago
- llm-d helm charts and deployment examples☆58May 1, 2026Updated last month
- A simple, performant and scalable Jax LLM!☆2,322Updated this week
- A practical way of learning Swizzle☆41Feb 3, 2025Updated last year
- Tokamax: A GPU and TPU kernel library.☆227Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆88Jun 5, 2026Updated last week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆19Nov 18, 2024Updated last year
- CUDA Embedding Lookup Kernel Library☆47Feb 9, 2026Updated 4 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆96Sep 4, 2024Updated last year
- Google TPU optimizations for transformers models☆137Jan 23, 2026Updated 4 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆224Updated this week
- ☆14Updated this week
- Accelerate, Optimize performance with streamlined training and serving options with JAX.☆365Updated this week
- EuroSys '24: "Trinity: A Fast Compressed Multi-attribute Data Store"☆18Mar 8, 2025Updated last year
- 🎹 Instruct.KR 2025 Summer Meetup: 오픈소스 LLM, vLLM으로 Production까지 🎹☆23Aug 2, 2025Updated 10 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆29Dec 17, 2024Updated last year
- Jetson Nano control and vision with ROS2 RealSense2, RPlidar, BNO055, Python, Pygame and ModBus☆12Feb 13, 2021Updated 5 years ago
- Security-native LLM system for AI-generated application security.☆263Jun 4, 2026Updated last week
- Cluster Toolkit is an open-source software offered by Google Cloud which makes it easy for customers to deploy AI/ML and HPC environments…☆346Updated this week
- A Lightweight LLM Post-Training Library☆2,341Updated this week
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆888Updated this week