A Triton-only attention backend for vLLM
☆24Feb 11, 2026Updated 3 weeks ago
Alternatives and similar repositories for vllm-triton-backend
Users that are interested in vllm-triton-backend are comparing it to the libraries listed below
Sorting:
- Development containers for triton and triton-cpu☆24Updated this week
- ☆23Jul 11, 2025Updated 7 months ago
- ☆13Jan 7, 2025Updated last year
- ☆111Updated this week
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆45Updated this week
- A lightweight triton-based General Matrix Multiplication (GEMM) library.☆48Updated this week
- A Triton JIT runtime and ffi provider in C++☆32Updated this week
- Ship correct and fast LLM kernels to PyTorch☆144Jan 14, 2026Updated last month
- WaferLLM: Large Language Model Inference at Wafer Scale☆90Jan 7, 2026Updated last month
- Manages vllm-nccl dependency☆17Jun 3, 2024Updated last year
- Pure Triton kernels for Qwen3.5-27B inference on NVIDIA B200☆66Updated this week
- PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.☆29Feb 3, 2026Updated last month
- Automatic differentiation for Triton Kernels☆29Aug 12, 2025Updated 6 months ago
- Julia implementation of the Flash Attention algorithm☆19Sep 4, 2023Updated 2 years ago
- ☆44Updated this week
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 5 months ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆25May 12, 2025Updated 9 months ago
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆54Feb 6, 2026Updated last month
- A bunch of kernels that might make stuff slower 😉☆75Feb 18, 2026Updated 2 weeks ago
- ☆160Dec 27, 2024Updated last year
- ☆31Apr 19, 2025Updated 10 months ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆196Updated this week
- Vstream - Video Analytics pipeline with Hardware based accelerations (dev - stage)☆10Feb 2, 2024Updated 2 years ago
- Multi-GPU communication profiler and visualizer☆38Jun 10, 2024Updated last year
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆13Jan 16, 2026Updated last month
- A collection of reproducible inference engine benchmarks☆38Apr 22, 2025Updated 10 months ago
- ☆53Feb 24, 2026Updated last week
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- ☆28Dec 3, 2025Updated 3 months ago
- 详细双语注释版word2vec源码,well-annotated word2vec☆10Oct 3, 2021Updated 4 years ago
- 国产加速卡-海光DCU实战(大模型训练、微调、推理 等)☆70Aug 10, 2025Updated 6 months ago
- Collection of kernels written in Triton language☆178Jan 27, 2026Updated last month
- A domain-specific language (DSL) based on Triton but providing higher-level abstractions.☆41Feb 4, 2026Updated last month
- Practical exercises for HOW Series "Deep Dive", a Web-based training on parallel programming and performance optimization☆33Feb 1, 2019Updated 7 years ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 8 months ago
- Precision Knowledge Editing (PKE): A novel method to reduce toxicity in LLMs while preserving performance, with robust evaluations and ha…☆11Nov 26, 2024Updated last year
- An implementation of MSSRM method☆11Mar 23, 2023Updated 2 years ago
- ☆20May 24, 2025Updated 9 months ago
- 2020湖南省第一届人工智能大赛参赛作品☆11Feb 17, 2022Updated 4 years ago