LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model
☆75Oct 18, 2025Updated 4 months ago
Alternatives and similar repositories for TritonLLM
Users that are interested in TritonLLM are comparing it to the libraries listed below
Sorting:
- ☆123Updated this week
- triton for dsa☆58Updated this week
- FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang…☆214Updated this week
- A Triton-only attention backend for vLLM☆24Feb 11, 2026Updated 3 weeks ago
- JAX bindings for the flash-attention3 kernels☆21Jan 2, 2026Updated 2 months ago
- ☆27Jan 7, 2025Updated last year
- Manages vllm-nccl dependency☆17Jun 3, 2024Updated last year
- Getting Started with Triton: A Tutorial for Python Beginners☆37Oct 21, 2025Updated 4 months ago
- ☆87Jan 22, 2026Updated last month
- ☆31Feb 12, 2026Updated 3 weeks ago
- A minimal implementation of vllm.☆68Jul 27, 2024Updated last year
- PyTorch distributed training acceleration framework☆54Aug 13, 2025Updated 6 months ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆92Jan 26, 2026Updated last month
- ☆26Aug 19, 2022Updated 3 years ago
- Triton based sparse quantization attention kernel collection☆43Aug 29, 2025Updated 6 months ago
- Optimize tensor program fast with Felix, a gradient descent autotuner.☆32Apr 27, 2024Updated last year
- Tile-based language built for AI computation across all scales☆138Feb 27, 2026Updated last week
- ☆34Feb 3, 2025Updated last year
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆13Jan 16, 2026Updated last month
- ☆88Updated this week
- Distributed Compiler based on Triton for Parallel Systems☆1,371Feb 13, 2026Updated 3 weeks ago
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- Distributed MoE in a Single Kernel [NeurIPS '25]☆194Feb 27, 2026Updated last week
- Collection of kernels written in Triton language☆178Jan 27, 2026Updated last month
- Codes For Sharing☆40Mar 30, 2021Updated 4 years ago
- MLIR-based toolkit targeting intel heterogeneous hardware☆50Updated this week
- ☆152Jan 9, 2025Updated last year
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- Word2Vec 任务的并行计算实现☆11Sep 11, 2017Updated 8 years ago
- Protocol buffers and other common resources.☆13Updated this week
- This project is based on the [LTX-Video](https://github.com/Lightricks/LTX-Video) algorithm of the diffusers and optimized and accelerate…☆13Dec 31, 2024Updated last year
- AAAI2025☆11Apr 18, 2025Updated 10 months ago
- hadoop 的 docker 集群配置☆10Jun 8, 2024Updated last year
- Code accompanying the NeurIPS 2019 paper AutoAssist: A Framework to Accelerate Training of Deep Neural Networks.☆14Oct 3, 2022Updated 3 years ago
- python port of arc90's readability bookmarklet, updated to match latest readability.js!☆19Sep 13, 2011Updated 14 years ago
- Stateful LLM Serving☆97Mar 11, 2025Updated 11 months ago
- ☆816Feb 28, 2026Updated last week
- ☆38Jun 27, 2025Updated 8 months ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆92Updated this week