PotatoSpudowski / fastLLaMa

fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backend.
409Updated last year

Alternatives and similar repositories for fastLLaMa:

Users that are interested in fastLLaMa are comparing it to the libraries listed below