Bruce-Lee-LY / flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
29Updated 2 months ago

Related projects

Alternatives and complementary repositories for flash_attention_inference