thomasantony / llamacpp-pythonLinks

Python bindings for llama.cpp

☆197

Alternatives and similar repositories for llamacpp-python

Users that are interested in llamacpp-python are comparing it to the libraries listed below

Sorting:

PotatoSpudowski / fastLLaMa
fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backe…
☆410Updated 2 years ago
NolanoOrg / cformers
SoTA Transformers with C-backend for fast inference on your CPU.
☆309Updated last year
cmp-nct / ggllm.cpp
Falcon LLM ggml framework with CPU and GPU support
☆246Updated last year
closedai-project / closedai
Drop in replacement for OpenAI, but with Open models.
☆152Updated 2 years ago
abetlen / ggml-python
Python bindings for ggml
☆142Updated 11 months ago
johnsmith0031 / alpaca_lora_4bit
☆534Updated last year
skeskinen / llama-lite
Embeddings focused small version of Llama NLP model
☆103Updated 2 years ago
harrisonvanderbyl / rwkvstic
Framework agnostic python runtime for RWKV models
☆146Updated last year
harrisonvanderbyl / rwkv-cpp-accelerated
A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…
☆312Updated last year
eugenepentland / landmark-attention-qlora
Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA
☆123Updated 2 years ago
skeskinen / bert.cpp
ggml implementation of BERT
☆495Updated last year
aigoopy / llm-jeopardy
Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts
☆110Updated 2 years ago
PygmalionAI / training-code
The code we currently use to fine-tune models.
☆114Updated last year
togethercomputer / redpajama.cpp
Extend the original llama.cpp repo to support redpajama model.
☆118Updated 11 months ago
Birch-san / mpt-play
Command-line script for inferencing from models such as MPT-7B-Chat
☆100Updated 2 years ago
TheBlokeAI / AIScripts
Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub
☆162Updated last year
aspctu / alpaca-lora
Instruct-tuning LLaMA on consumer hardware
☆66Updated 2 years ago
aarnphm / whispercpp
Pybind11 bindings for Whisper.cpp
☆334Updated 7 months ago
NouamaneTazi / bloomz.cpp
C++ implementation for BLOOM
☆810Updated 2 years ago
bigcode-project / starcoder.cpp
C++ implementation for 💫StarCoder
☆456Updated last year
clcarwin / alpaca-weight
Train llama with lora on one 4090 and merge weight of lora to work as stanford alpaca.
☆51Updated 2 years ago
chrisociepa / allamo
Simple, hackable and fast implementation for training/finetuning medium-sized LLaMA-based models
☆177Updated this week
petals-infra / chat.petals.dev
💬 Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client
☆314Updated last year
Blealtan / RWKV-LM-LoRA
RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best …
☆412Updated 2 years ago
randaller / llama-cpu
Inference on CPU code for LLaMA models
☆137Updated 2 years ago
absadiki / pyllamacpp
Python bindings for llama.cpp
☆65Updated last year
PABannier / biogpt.cpp
Port of Microsoft's BioGPT in C/C++ using ggml
☆87Updated last year
zphang / minimal-llama
☆458Updated last year
jllllll / exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆64Updated last year
ggml-org / p1
LLM-based code completion engine
☆193Updated 6 months ago