randaller / llama-cpuLinks

Inference on CPU code for LLaMA models

☆137

Alternatives and similar repositories for llama-cpu

Users that are interested in llama-cpu are comparing it to the libraries listed below

Sorting:

modular-ml / wrapyfi-examples_llama
Inference code for facebook LLaMA models with Wrapyfi support
☆129Updated 2 years ago
nuance1979 / llama-server
LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.
☆128Updated 2 years ago
aigoopy / llm-jeopardy
Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts
☆110Updated 2 years ago
thomasantony / llamacpp-python
Python bindings for llama.cpp
☆198Updated 2 years ago
Gryphe / BlockMerge_Gradient
Merge Transformers language models by use of gradient parameters.
☆206Updated last year
cmp-nct / ggllm.cpp
Falcon LLM ggml framework with CPU and GPU support
☆246Updated last year
oobabooga / GPTQ-for-LLaMa
4 bits quantization of LLaMa using GPTQ
☆130Updated 2 years ago
taprosoft / llm_finetuning
Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…
☆146Updated last year
LambdaLabsML / llama
Inference code for LLaMA models
☆42Updated 2 years ago
skeskinen / llama-lite
Embeddings focused small version of Llama NLP model
☆103Updated 2 years ago
randaller / llama-chat
Chat with Meta's LLaMA models at home made easy
☆837Updated 2 years ago
petals-infra / chat.petals.dev
💬 Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client
☆314Updated last year
TheBlokeAI / AIScripts
Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub
☆162Updated last year
clcarwin / alpaca-weight
Train llama with lora on one 4090 and merge weight of lora to work as stanford alpaca.
☆51Updated 2 years ago
bigcode-project / starcoder.cpp
C++ implementation for 💫StarCoder
☆456Updated last year
jllllll / exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆64Updated last year
togethercomputer / redpajama.cpp
Extend the original llama.cpp repo to support redpajama model.
☆118Updated 11 months ago
eugenepentland / landmark-attention-qlora
Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA
☆123Updated 2 years ago
mzbac / AutoGPTQ-API
Host the GPTQ model using AutoGPTQ as an API that is compatible with text generation UI API.
☆91Updated 2 years ago
Gryphe / MergeMonster
An unsupervised model merging algorithm for Transformers-based language models.
☆106Updated last year
aspctu / alpaca-lora
Instruct-tuning LLaMA on consumer hardware
☆66Updated 2 years ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆87Updated last week
mayank31398 / GPTQ-for-SantaCoder
4 bits quantization of SantaCoder using GPTQ
☆51Updated 2 years ago
chu-tianxiang / llama-cpp-torch
llama.cpp to PyTorch Converter
☆34Updated last year
rafacelente / bllama
1.58-bit LLaMa model
☆81Updated last year
zphang / minimal-llama
☆458Updated last year
NVIDIA / trt-llm-as-openai-windows
This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…
☆126Updated last year
OpenAccess-AI-Collective / ggml-webui
Deploy your GGML models to HuggingFace Spaces with Docker and gradio
☆37Updated 2 years ago
johnsmith0031 / alpaca_lora_4bit
☆534Updated last year
AlpinDale / sparsegpt-for-LLaMA
Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.
☆71Updated 2 years ago