cmp-nct / ggllm.cpp

Falcon LLM ggml framework with CPU and GPU support

☆244

Related projects ⓘ

Alternatives and complementary repositories for ggllm.cpp

eugenepentland / landmark-attention-qlora
Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA
☆124Updated last year
bigcode-project / starcoder.cpp
C++ implementation for 💫StarCoder
☆445Updated last year
aigoopy / llm-jeopardy
Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts
☆111Updated last year
johnsmith0031 / alpaca_lora_4bit
☆534Updated 11 months ago
epolewski / EricLLM
A fast batching API to serve LLM models
☆172Updated 6 months ago
TheBlokeAI / dockerLLM
TheBloke's Dockerfiles
☆299Updated 8 months ago
togethercomputer / redpajama.cpp
Extend the original llama.cpp repo to support redpajama model.
☆117Updated 2 months ago
rmihaylov / falcontune
Tune any FALCON in 4-bit
☆468Updated last year
TheBlokeAI / AIScripts
Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub
☆155Updated last year
turboderp / exui
Web UI for ExLlamaV2
☆438Updated last month
mzbac / qlora-fine-tune
☆168Updated last year
cognitivecomputations / laserRMT
This is our own implementation of 'Layer Selective Rank Reduction'
☆231Updated 5 months ago
NolanoOrg / cformers
SoTA Transformers with C-backend for fast inference on your CPU.
☆312Updated 11 months ago
jllllll / exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆66Updated last year
taprosoft / llm_finetuning
Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…
☆143Updated last year
VatsaDev / NanoPhi-alpha
GPT-2 small trained on phi-like data
☆65Updated 8 months ago
Birch-san / falcon-play
Command-line script for inferencing from models such as falcon-7b-instruct
☆75Updated last year
skeskinen / bert.cpp
ggml implementation of BERT
☆464Updated 8 months ago
ggml-org / p1
LLM-based code completion engine
☆173Updated last year
kaiokendev / superbig
A prompt/context management system
☆165Updated last year
PygmalionAI / training-code
The code we currently use to fine-tune models.
☆108Updated 6 months ago
Gryphe / MergeMonster
An unsupervised model merging algorithm for Transformers-based language models.
☆99Updated 6 months ago
chrisociepa / allamo
Simple, hackable and fast implementation for training/finetuning medium-sized LLaMA-based models
☆152Updated this week
abetlen / ggml-python
Python bindings for ggml
☆132Updated 2 months ago
Gryphe / BlockMerge_Gradient
Merge Transformers language models by use of gradient parameters.
☆202Updated 3 months ago
runpod-workers / worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
☆242Updated last week
lastmile-ai / llama-retrieval-plugin
LLaMa retrieval plugin script using OpenAI's retrieval plugin
☆324Updated last year
jondurbin / qlora
QLoRA: Efficient Finetuning of Quantized LLMs
☆77Updated 6 months ago
epfml / landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
☆415Updated 10 months ago