Bruce-Lee-LY / flash_attention_inferenceLinks
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
☆43Updated 9 months ago
Alternatives and similar repositories for flash_attention_inference
Users that are interested in flash_attention_inference are comparing it to the libraries listed below
Sorting: