A Triton-only attention backend for vLLM
☆25Mar 17, 2026Updated last month
Alternatives and similar repositories for vllm-triton-backend
Users that are interested in vllm-triton-backend are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Development containers for triton and triton-cpu☆27Updated this week
- ☆24Apr 7, 2026Updated 3 weeks ago
- Automatic differentiation for Triton Kernels☆29Aug 12, 2025Updated 8 months ago
- Ship correct and fast LLM kernels to PyTorch☆149Jan 14, 2026Updated 3 months ago
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆53Apr 28, 2026Updated last week
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Framework to reduce autotune overhead to zero for well known deployments.☆99Sep 19, 2025Updated 7 months ago
- Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks☆15Feb 17, 2025Updated last year
- FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.☆175Updated this week
- Julia implementation of the Flash Attention algorithm☆19Sep 4, 2023Updated 2 years ago
- WaferLLM: Large Language Model Inference at Wafer Scale☆101Apr 4, 2026Updated last month
- ☆13Jan 7, 2025Updated last year
- Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration☆46Jan 8, 2026Updated 3 months ago
- Cute layout visualization☆38Jan 18, 2026Updated 3 months ago
- my solution for UC Berkeley AI projects pacman☆11Jul 25, 2020Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Music large model based on InternLM2-chat.☆23Dec 21, 2024Updated last year
- incubator repo for CUDA-TileIR backend☆134Apr 22, 2026Updated 2 weeks ago
- ☆12May 23, 2018Updated 7 years ago
- Manages vllm-nccl dependency☆18Jun 3, 2024Updated last year
- ☆17Mar 26, 2025Updated last year
- ☆15Apr 28, 2023Updated 3 years ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆204Updated this week
- C-compatible enum for Julia☆15Dec 23, 2023Updated 2 years ago
- Fork of Enzyme to work on Reverse-Mode Differentiation at the MLIR-level.☆11Apr 23, 2023Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Hands-On Practical MLIR Tutorial☆54Aug 21, 2025Updated 8 months ago
- Advent of Code 2023 (Mojo)☆12Sep 30, 2024Updated last year
- ☆53Apr 13, 2026Updated 3 weeks ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆97Jan 16, 2026Updated 3 months ago
- PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.☆32Apr 1, 2026Updated last month
- ☆20May 24, 2025Updated 11 months ago
- Code and experiments for the NeurIPS 2023 paper Stabilized Neural Differential Equations for Learning Dynamics with Explicit Constraints☆12Mar 26, 2024Updated 2 years ago
- ☆33Apr 19, 2025Updated last year
- Github mirror of trition-lang/triton repo.☆162Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- OpenAI Triton backend for Intel® GPUs☆249Updated this week
- Collection of kernels written in Triton language☆191Jan 27, 2026Updated 3 months ago
- ☆140Mar 5, 2026Updated 2 months ago
- Pure Triton kernels for Qwen3.5-27B inference on NVIDIA B200☆108Feb 28, 2026Updated 2 months ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆25May 12, 2025Updated 11 months ago
- A collection of reproducible inference engine benchmarks☆38Apr 22, 2025Updated last year
- Vstream - Video Analytics pipeline with Hardware based accelerations (dev - stage)☆10Feb 2, 2024Updated 2 years ago