FranxYao / Retrieval-Head-with-Flash-Attention

Efficient retrieval head analysis with triton flash attention that supports topK probability
12Updated 4 months ago

Related projects

Alternatives and complementary repositories for Retrieval-Head-with-Flash-Attention