modular-ml / wrapyfi-examples_llamaLinks

Inference code for facebook LLaMA models with Wrapyfi support

☆129

Alternatives and similar repositories for wrapyfi-examples_llama

Users that are interested in wrapyfi-examples_llama are comparing it to the libraries listed below

Sorting:

johnsmith0031 / alpaca_lora_4bit
☆534Updated last year
zphang / minimal-llama
☆457Updated 2 years ago
PotatoSpudowski / fastLLaMa
fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backe…
☆411Updated 2 years ago
bjoernpl / llama_gradio_interface
Inference code for LLaMA models with Gradio Interface and rolling generation like ChatGPT
☆48Updated 2 years ago
harrisonvanderbyl / rwkvstic
Framework agnostic python runtime for RWKV models
☆146Updated 2 years ago
aspctu / alpaca-lora
Instruct-tuning LLaMA on consumer hardware
☆65Updated 2 years ago
epfml / landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
☆426Updated last year
clcarwin / alpaca-weight
Train llama with lora on one 4090 and merge weight of lora to work as stanford alpaca.
☆52Updated 2 years ago
rmihaylov / falcontune
Tune any FALCON in 4-bit
☆464Updated 2 years ago
radi-cho / botbots
A dataset featuring diverse dialogues between two ChatGPT (gpt-3.5-turbo) instances with system messages written by GPT-4. Covering vario…
☆163Updated 2 years ago
mayank31398 / GPTQ-for-SantaCoder
4 bits quantization of SantaCoder using GPTQ
☆51Updated 2 years ago
declare-lab / flan-alpaca
This repository contains code for extending the Stanford Alpaca synthetic instruction tuning to existing instruction-tuned models such as…
☆356Updated 2 years ago
randaller / llama-cpu
Inference on CPU code for LLaMA models
☆137Updated 2 years ago
eugenepentland / landmark-attention-qlora
Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA
☆123Updated 2 years ago
pointnetwork / point-alpaca
☆403Updated 2 years ago
NolanoOrg / cformers
SoTA Transformers with C-backend for fast inference on your CPU.
☆308Updated last year
galatolofederico / vanilla-llama
Plain pytorch implementation of LLaMA
☆188Updated 2 years ago
kuleshov-group / llmtools
Finetuning Large Language Models on One Consumer GPU in 2 Bits
☆732Updated last year
yxuansu / OpenAlpaca
OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA
☆301Updated 2 years ago
thomasantony / llamacpp-python
Python bindings for llama.cpp
☆198Updated 2 years ago
skeskinen / llama-lite
Embeddings focused small version of Llama NLP model
☆106Updated 2 years ago
lastmile-ai / llama-retrieval-plugin
LLaMa retrieval plugin script using OpenAI's retrieval plugin
☆323Updated 2 years ago
aigoopy / llm-jeopardy
Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts
☆108Updated 2 years ago
rmihaylov / mpttune
Tune MPTs
☆84Updated 2 years ago
chrisociepa / allamo
Simple, hackable and fast implementation for training/finetuning medium-sized LLaMA-based models
☆182Updated 2 months ago
iwalton3 / mpt-lora-patch
Patch for MPT-7B which allows using and training a LoRA
☆58Updated 2 years ago
mbzuai-nlp / LaMini-LM
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
☆822Updated 2 years ago
serp-ai / LLaMA-8bit-LoRA
Repository for Chat LLaMA - training a LoRA for the LLaMA (1 or 2) models on HuggingFace with 8-bit or 4-bit quantization. Research only.
☆149Updated 2 years ago
SkunkworksAI / hydra-moe
☆414Updated 2 years ago
harrisonvanderbyl / rwkv-cpp-accelerated
A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…
☆313Updated last year