PotatoSpudowski / fastLLaMaLinks

fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backend.

☆412

Alternatives and similar repositories for fastLLaMa

Users that are interested in fastLLaMa are comparing it to the libraries listed below

Sorting:

NolanoOrg / cformers
SoTA Transformers with C-backend for fast inference on your CPU.
☆308Updated last year
johnsmith0031 / alpaca_lora_4bit
☆534Updated last year
thomasantony / llamacpp-python
Python bindings for llama.cpp
☆198Updated 2 years ago
NouamaneTazi / bloomz.cpp
C++ implementation for BLOOM
☆806Updated 2 years ago
rmihaylov / falcontune
Tune any FALCON in 4-bit
☆464Updated 2 years ago
eugenepentland / landmark-attention-qlora
Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA
☆123Updated 2 years ago
zphang / minimal-llama
☆457Updated 2 years ago
pointnetwork / point-alpaca
☆403Updated 2 years ago
cmp-nct / ggllm.cpp
Falcon LLM ggml framework with CPU and GPU support
☆247Updated last year
SkunkworksAI / hydra-moe
☆415Updated last year
lastmile-ai / llama-retrieval-plugin
LLaMa retrieval plugin script using OpenAI's retrieval plugin
☆323Updated 2 years ago
bigcode-project / starcoder.cpp
C++ implementation for 💫StarCoder
☆455Updated 2 years ago
modular-ml / wrapyfi-examples_llama
Inference code for facebook LLaMA models with Wrapyfi support
☆129Updated 2 years ago
togethercomputer / redpajama.cpp
Extend the original llama.cpp repo to support redpajama model.
☆118Updated last year
harrisonvanderbyl / rwkv-cpp-accelerated
A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…
☆313Updated last year
melodysdreamj / WizardVicunaLM
LLM that combines the principles of wizardLM and vicunaLM
☆716Updated 2 years ago
aigoopy / llm-jeopardy
Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts
☆108Updated 2 years ago
epfml / landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
☆426Updated last year
aspctu / alpaca-lora
Instruct-tuning LLaMA on consumer hardware
☆65Updated 2 years ago
alasdairforsythe / tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
☆603Updated last year
skeskinen / bert.cpp
ggml implementation of BERT
☆492Updated last year
jondurbin / airoboros
Customizable implementation of the self-instruct paper.
☆1,050Updated last year
harrisonvanderbyl / rwkvstic
Framework agnostic python runtime for RWKV models
☆145Updated 2 years ago
closedai-project / closedai
Drop in replacement for OpenAI, but with Open models.
☆153Updated 2 years ago
mzbac / qlora-fine-tune
☆166Updated 2 years ago
Nuggt-dev / Nuggt
An Autonomous LLM Agent that runs on Wizcoder-15B
☆333Updated last year
chrisociepa / allamo
Simple, hackable and fast implementation for training/finetuning medium-sized LLaMA-based models
☆182Updated last month
keldenl / gpt-llama.cpp
A llama.cpp drop-in replacement for OpenAI's GPT endpoints, allowing GPT-powered apps to run off local llama.cpp models instead of OpenAI…
☆598Updated 2 years ago
declare-lab / flan-alpaca
This repository contains code for extending the Stanford Alpaca synthetic instruction tuning to existing instruction-tuned models such as…
☆356Updated 2 years ago
mbzuai-nlp / LaMini-LM
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
☆821Updated 2 years ago