randaller / llama-cpu
Inference on CPU code for LLaMA models
☆137Updated last year
Alternatives and similar repositories for llama-cpu:
Users that are interested in llama-cpu are comparing it to the libraries listed below
- Python bindings for llama.cpp☆199Updated last year
- LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.☆117Updated last year
- GPTQ inference Triton kernel☆295Updated last year
- Inference code for facebook LLaMA models with Wrapyfi support☆130Updated last year
- Simple, hackable and fast implementation for training/finetuning medium-sized LLaMA-based models☆163Updated last week
- SoTA Transformers with C-backend for fast inference on your CPU.☆311Updated last year
- LLM-based code completion engine☆178Updated 3 weeks ago
- Chat with Meta's LLaMA models at home made easy☆834Updated last year
- Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts☆111Updated last year
- fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backe…☆408Updated last year
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆41Updated last year
- Train llama with lora on one 4090 and merge weight of lora to work as stanford alpaca.☆50Updated last year
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆123Updated last year
- Host the GPTQ model using AutoGPTQ as an API that is compatible with text generation UI API.☆91Updated last year
- ☆117Updated 9 months ago
- 💬 Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client☆308Updated 9 months ago
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- C++ implementation for 💫StarCoder☆450Updated last year
- Data preparation code for Amber 7B LLM☆85Updated 9 months ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆158Updated this week
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub☆157Updated last year
- ☆74Updated last year
- 4 bits quantization of SantaCoder using GPTQ☆51Updated last year
- Python bindings for llama.cpp☆65Updated 11 months ago
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…☆147Updated last year
- ☆456Updated last year
- An OpenAI Completions API compatible server for NLP transformers models☆63Updated last year
- GPT-2 small trained on phi-like data☆65Updated last year
- ☆539Updated 2 months ago
- Python bindings for ggml☆136Updated 5 months ago