luliyucoordinate / flash-attention-minimalLinks
Flash Attention in ~100 lines of CUDA (forward pass only)
☆10Updated 11 months ago
Alternatives and similar repositories for flash-attention-minimal
Users that are interested in flash-attention-minimal are comparing it to the libraries listed below
Sorting:
- TensorRT-in-Action 是一个 GitHub 代码库,提供了使用 TensorRT 的代码示例,并有对应 Jupyter Notebook。☆16Updated 2 years ago
- Awesome code, projects, books, etc. related to CUDA☆17Updated last month
- 使用 CUDA C++ 实现的 llama 模型推理框架