foundation-model-stack / vllm-triton-backendView external linksLinks
A Triton-only attention backend for vLLM
☆23Updated this week
Alternatives and similar repositories for vllm-triton-backend
Users that are interested in vllm-triton-backend are comparing it to the libraries listed below
Sorting:
- ☆23Jul 11, 2025Updated 7 months ago
- ☆13Jan 7, 2025Updated last year
- Development containers for triton and triton-cpu☆23Feb 1, 2026Updated last week
- ☆42Jan 24, 2026Updated 3 weeks ago
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆44Updated this week
- A lightweight triton-based General Matrix Multiplication (GEMM) library.☆43Updated this week
- A Triton JIT runtime and ffi provider in C++☆31Jan 26, 2026Updated 2 weeks ago
- WaferLLM: Large Language Model Inference at Wafer Scale☆88Jan 7, 2026Updated last month
- Manages vllm-nccl dependency☆17Jun 3, 2024Updated last year
- PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.☆28Feb 3, 2026Updated last week
- Julia implementation of the Flash Attention algorithm☆19Sep 4, 2023Updated 2 years ago
- Automatic differentiation for Triton Kernels☆29Aug 12, 2025Updated 6 months ago
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆37Feb 6, 2026Updated last week
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆25May 12, 2025Updated 9 months ago
- Ship correct and fast LLM kernels to PyTorch☆141Jan 14, 2026Updated 3 weeks ago
- Vstream - Video Analytics pipeline with Hardware based accelerations (dev - stage)☆10Feb 2, 2024Updated 2 years ago
- ☆26Dec 3, 2025Updated 2 months ago
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆13Jan 16, 2026Updated 3 weeks ago
- ☆54May 5, 2025Updated 9 months ago
- 国产加速卡-海光DCU实战(大模型训练、微调、推理 等)☆69Aug 10, 2025Updated 6 months ago
- 详细双语注释版word2vec源码,well-annotated word2vec☆10Oct 3, 2021Updated 4 years ago
- my solution for UC Berkeley AI projects pacman☆11Jul 25, 2020Updated 5 years ago
- Collection of kernels written in Triton language☆178Jan 27, 2026Updated 2 weeks ago
- A domain-specific language (DSL) based on Triton but providing higher-level abstractions.☆41Feb 4, 2026Updated last week
- LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model☆64Oct 18, 2025Updated 3 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆96Sep 19, 2025Updated 4 months ago
- ☆10Jun 28, 2025Updated 7 months ago
- netbeacon - monitoring your network capture, NIDS or network analysis process☆19Oct 26, 2013Updated 12 years ago
- ☆13Jan 16, 2025Updated last year
- 2020湖南省第一届人工智能大赛参赛作品☆11Feb 17, 2022Updated 3 years ago
- yolo目标检测算法☆15Jul 27, 2025Updated 6 months ago
- Protocol buffers and other common resources.☆13Jan 20, 2026Updated 3 weeks ago
- This project is based on the [LTX-Video](https://github.com/Lightricks/LTX-Video) algorithm of the diffusers and optimized and accelerate…☆11Dec 31, 2024Updated last year
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- ☆20May 24, 2025Updated 8 months ago
- An implementation of MSSRM method☆11Mar 23, 2023Updated 2 years ago
- Precision Knowledge Editing (PKE): A novel method to reduce toxicity in LLMs while preserving performance, with robust evaluations and ha…☆11Nov 26, 2024Updated last year
- Code accompanying the NeurIPS 2019 paper AutoAssist: A Framework to Accelerate Training of Deep Neural Networks.☆14Oct 3, 2022Updated 3 years ago
- ☆14May 1, 2023Updated 2 years ago