TPU inference for vLLM, with unified JAX and PyTorch support.
☆338May 26, 2026Updated this week
Alternatives and similar repositories for tpu-inference
Users that are interested in tpu-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆18Feb 18, 2026Updated 3 months ago
- ☆20Mar 11, 2026Updated 2 months ago
- ☆35May 15, 2026Updated last week
- LinearKAN: A very fast implementation of Kolmogorov-Arnold Networks☆19Nov 12, 2025Updated 6 months ago
- Minimal yet performant LLM examples in pure JAX☆257Apr 10, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Automatic differentiation for Triton Kernels☆29Aug 12, 2025Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆29Updated this week
- ☆350Updated this week
- ☆15May 11, 2025Updated last year
- JAX bindings for the flash-attention3 kernels☆22Jan 2, 2026Updated 4 months ago
- ☆138Updated this week
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆132May 20, 2026Updated last week
- Large Language Model Text Generation Inference on Habana Gaudi☆34Mar 20, 2025Updated last year
- JAX backend for SGL☆275Updated this week
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆437Jan 5, 2026Updated 4 months ago
- Paper-reading notes for Berkeley OS prelim exam.☆14Aug 28, 2024Updated last year
- A simplified and automated orchestration workflow to perform ML end-to-end (E2E) model tests and benchmarking on Cloud VMs across differe…☆63May 19, 2026Updated last week
- llm-d helm charts and deployment examples☆57May 1, 2026Updated 3 weeks ago
- A simple, performant and scalable Jax LLM!☆2,295Updated this week
- Minimal, predictable, footgun-free config library.☆42Apr 14, 2026Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆19Nov 18, 2024Updated last year
- CUDA Embedding Lookup Kernel Library☆45Feb 9, 2026Updated 3 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Boosting 4-bit inference kernels with 2:4 Sparsity☆96Sep 4, 2024Updated last year
- Google TPU optimizations for transformers models☆137Jan 23, 2026Updated 4 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆222May 15, 2026Updated last week
- EuroSys '24: "Trinity: A Fast Compressed Multi-attribute Data Store"☆18Mar 8, 2025Updated last year
- Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)☆64May 5, 2026Updated 3 weeks ago
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- ☆23Aug 21, 2025Updated 9 months ago
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆29Dec 17, 2024Updated last year
- Jetson Nano control and vision with ROS2 RealSense2, RPlidar, BNO055, Python, Pygame and ModBus☆12Feb 13, 2021Updated 5 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆196May 4, 2026Updated 3 weeks ago
- Cluster Toolkit is an open-source software offered by Google Cloud which makes it easy for customers to deploy AI/ML and HPC environments…☆340Updated this week
- 上海交通大学软件学院本科计算机图形学课程代码仓库☆14Oct 3, 2025Updated 7 months ago
- An efficient method for the conversion from internal to Cartesian coordinates that utilizes the platform-agnostic JAX Python library.☆21Jun 12, 2024Updated last year
- A Lightweight LLM Post-Training Library☆2,292Updated this week
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆846Updated this week
- GenAI inference performance benchmarking tool☆190Updated this week