marella / ctransformersLinks

Python bindings for the Transformer models implemented in C/C++ using GGML library.

☆1,871

Alternatives and similar repositories for ctransformers

Users that are interested in ctransformers are comparing it to the libraries listed below

Sorting:

turboderp / exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆2,887Updated last year
turboderp-org / exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
☆4,239Updated last week
jondurbin / airoboros
Customizable implementation of the self-instruct paper.
☆1,046Updated last year
AutoGPTQ / AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
☆4,897Updated 3 months ago
rmihaylov / falcontune
Tune any FALCON in 4-bit
☆467Updated last year
casper-hansen / AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,210Updated 2 months ago
qwopqwop200 / GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ
☆3,059Updated last year
young-geng / EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆2,484Updated 11 months ago
aphrodite-engine / aphrodite-engine
Large-scale LLM inference engine
☆1,481Updated this week
intel / intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…
☆2,169Updated 9 months ago
kuleshov-group / llmtools
Finetuning Large Language Models on One Consumer GPU in 2 Bits
☆727Updated last year
deepspeedai / DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
☆2,038Updated 3 weeks ago
gururise / AlpacaDataCleaned
Alpaca dataset from Stanford, cleaned and curated
☆1,560Updated 2 years ago
ray-project / ray-llm
RayLLM - LLMs on Ray (Archived). Read README for more info.
☆1,259Updated 4 months ago
johnsmith0031 / alpaca_lora_4bit
☆535Updated last year
xlang-ai / instructor-embedding
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
☆1,988Updated 6 months ago
IST-DASLab / gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
☆2,148Updated last year
S-LoRA / S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
☆1,843Updated last year
jquesnelle / yarn
YaRN: Efficient Context Window Extension of Large Language Models
☆1,521Updated last year
abacaj / fine-tune-mistral
Fine-tune mistral-7B on 3090s, a100s, h100s
☆715Updated last year
RWKV / rwkv.cpp
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
☆1,533Updated 4 months ago
tomaarsen / attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
☆702Updated last year
MeetKai / functionary
Chat language model that can use tools and interpret the results
☆1,572Updated this week
noamgat / lm-format-enforcer
Enforce the output format (JSON Schema, Regex etc) of a language model
☆1,848Updated 4 months ago
huggingface / llm-vscode
LLM powered development for VSCode
☆1,307Updated last year
NouamaneTazi / bloomz.cpp
C++ implementation for BLOOM
☆810Updated 2 years ago
PotatoSpudowski / fastLLaMa
fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backe…
☆410Updated 2 years ago
mbzuai-nlp / LaMini-LM
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
☆820Updated 2 years ago
teknium1 / GPTeacher
A collection of modular datasets generated by GPT-4, General-Instruct - Roleplay-Instruct - Code-Instruct - and Toolformer
☆1,637Updated last year
lxe / simple-llm-finetuner
Simple UI for LLM Model Finetuning
☆2,060Updated last year