A Triton-only attention backend for vLLM
☆25Mar 17, 2026Updated 2 months ago
Alternatives and similar repositories for vllm-triton-backend
Users that are interested in vllm-triton-backend are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆24May 26, 2026Updated 2 weeks ago
- A Triton JIT runtime and ffi provider in C++☆35May 27, 2026Updated 2 weeks ago
- Automatic differentiation for Triton Kernels☆29Aug 12, 2025Updated 10 months ago
- Ship correct and fast LLM kernels to PyTorch☆150Jan 14, 2026Updated 5 months ago
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆57Jun 8, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Framework to reduce autotune overhead to zero for well known deployments.☆101Sep 19, 2025Updated 8 months ago
- A lightweight triton-based General Matrix Multiplication (GEMM) library.☆65Updated this week
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆61Feb 6, 2026Updated 4 months ago
- FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.☆201Updated this week
- WaferLLM: Large Language Model Inference at Wafer Scale☆108Updated this week
- ☆13Jan 7, 2025Updated last year
- Unofficial PyTorch reproduction of DeepSeek's Thinking with Visual Primitives.☆107Updated this week
- Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration☆51Jan 8, 2026Updated 5 months ago
- my solution for UC Berkeley AI projects pacman☆11Jul 25, 2020Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Cute layout visualization☆40Jan 18, 2026Updated 4 months ago
- incubator repo for CUDA-TileIR backend☆140Updated this week
- Manages vllm-nccl dependency☆18Jun 3, 2024Updated 2 years ago
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang☆44Nov 19, 2025Updated 6 months ago
- ☆17Mar 26, 2025Updated last year
- ☆15Apr 28, 2023Updated 3 years ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆210Updated this week
- C-compatible enum for Julia☆15Dec 23, 2023Updated 2 years ago
- ☆47Sep 8, 2025Updated 9 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Fork of Enzyme to work on Reverse-Mode Differentiation at the MLIR-level.☆11Apr 23, 2023Updated 3 years ago
- Advent of Code 2023 (Mojo)☆12Sep 30, 2024Updated last year
- Hands-On Practical MLIR Tutorial☆59Aug 21, 2025Updated 9 months ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆99Jan 16, 2026Updated 4 months ago
- We put all ready-to-go models here☆20Dec 30, 2022Updated 3 years ago
- PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.☆32May 12, 2026Updated last month
- ☆20May 24, 2025Updated last year
- Code and experiments for the NeurIPS 2023 paper Stabilized Neural Differential Equations for Learning Dynamics with Explicit Constraints☆12Mar 26, 2024Updated 2 years ago
- ☆33Apr 19, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- OpenAI Triton backend for Intel® GPUs☆255Updated this week
- Collection of kernels written in Triton language☆199Jan 27, 2026Updated 4 months ago
- Pure Triton kernels for Qwen3.5-27B inference on NVIDIA B200☆115Feb 28, 2026Updated 3 months ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆25May 12, 2025Updated last year
- Vstream - Video Analytics pipeline with Hardware based accelerations (dev - stage)☆10Feb 2, 2024Updated 2 years ago
- A collection of reproducible inference engine benchmarks☆38Apr 22, 2025Updated last year
- ☆157Mar 5, 2026Updated 3 months ago