vllm-project / tpu-inferenceLinks
TPU inference for vLLM, with unified JAX and PyTorch support.
☆155Updated this week
Alternatives and similar repositories for tpu-inference
Users that are interested in tpu-inference are comparing it to the libraries listed below
Sorting:
- How to ensure correctness and ship LLM generated kernels in PyTorch☆114Updated last week
- Applied AI experiments and examples for PyTorch☆302Updated 2 months ago
- ☆93Updated last year
- extensible collectives library in triton☆90Updated 7 months ago
- torchcomms: a modern PyTorch communications API☆245Updated this week
- JAX backend for SGL☆146Updated this week
- ☆246Updated this week
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆77Updated last month
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆277Updated this week
- Fast low-bit matmul kernels in Triton☆392Updated 2 weeks ago
- PyTorch bindings for CUTLASS grouped GEMM.☆126Updated 5 months ago
- ring-attention experiments☆155Updated last year
- Cataloging released Triton kernels.☆265Updated 2 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆85Updated last year
- ☆147Updated 10 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆216Updated this week
- Collection of kernels written in Triton language☆161Updated 7 months ago
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆128Updated this week
- Github mirror of trition-lang/triton repo.☆98Updated this week
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆93Updated 4 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆118Updated last year
- Load compute kernels from the Hub☆326Updated this week
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆299Updated this week
- kernels, of the mega variety☆597Updated last month
- ☆65Updated 6 months ago
- Allow torch tensor memory to be released and resumed later☆164Updated last week
- A Quirky Assortment of CuTe Kernels☆651Updated 2 weeks ago
- ☆218Updated 9 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆271Updated last week
- Perplexity GPU Kernels☆528Updated this week