Optimize softmax in triton in many cases
☆23Sep 6, 2024Updated last year
Alternatives and similar repositories for optimize_softmax
Users that are interested in optimize_softmax are comparing it to the libraries listed below
Sorting:
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆35Sep 15, 2023Updated 2 years ago
- FlagGems is an operator library for large language models implemented in the Triton Language.☆904Updated this week
- ☆32Jul 2, 2025Updated 8 months ago
- ☆21Aug 14, 2024Updated last year
- 一个轻量化的大模型推理框架