ShaYeBuHui01 / flash_attention_inference
View external linksLinks

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
16Aug 31, 2023Updated 2 years ago

Alternatives and similar repositories for flash_attention_inference

Users that are interested in flash_attention_inference are comparing it to the libraries listed below

Sorting:

Are these results useful?