OpenPPL / ppl.nn.llmLinks
☆141Updated last year
Alternatives and similar repositories for ppl.nn.llm
Users that are interested in ppl.nn.llm are comparing it to the libraries listed below
Sorting:
- ☆152Updated last year
- ☆130Updated last year
- ☆60Updated last year
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆44Updated 11 months ago
- ☆105Updated last year
- ☆145Updated last year
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Updated 4 months ago
- Transformer related optimization, including BERT, GPT☆59Updated 2 years ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆78Updated last year
- ☆38Updated last year
- ☆118Updated 10 months ago
- A collection of memory efficient attention operators implemented in the Triton language.