RahulSChand / gpu_poorLinks
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
☆1,385Updated last year
Alternatives and similar repositories for gpu_poor
Users that are interested in gpu_poor are comparing it to the libraries listed below
Sorting:
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,301Updated 8 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆2,144Updated last year