marella / ctransformers
Python bindings for the Transformer models implemented in C/C++ using GGML library.
☆1,853Updated last year
Alternatives and similar repositories for ctransformers:
Users that are interested in ctransformers are comparing it to the libraries listed below
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,842Updated last year
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆4,766Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,053Updated last week
- 4 bits quantization of LLaMA using GPTQ☆3,045Updated 8 months ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,021Updated 2 weeks ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆2,463Updated 7 months ago
- Customizable implementation of the self-instruct paper.☆1,040Updated last year
- Accessible large language models via k-bit quantization for PyTorch.☆6,818Updated this week
- Tune any FALCON in 4-bit☆466Updated last year
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".☆2,063Updated 11 months ago
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,163Updated 5 months ago
- Python bindings for llama.cpp☆8,846Updated this week
- [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings☆1,921Updated 2 months ago
- The hub for EleutherAI's work on interpretability and learning dynamics☆2,423Updated last week
- The RedPajama-Data repository contains code for preparing large datasets for training large language models.☆4,684Updated 3 months ago
- Large-scale LLM inference engine☆1,355Updated this week
- Finetuning Large Language Models on One Consumer GPU in 2 Bits☆719Updated 9 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆1,990Updated this week
- Inference Llama 2 in one file of pure 🔥☆2,113Updated 10 months ago
- Simple UI for LLM Model Finetuning☆2,059Updated last year
- C++ implementation for BLOOM☆809Updated last year
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,801Updated last year
- MiniLLM is a minimal system for running modern LLMs on consumer-grade GPUs☆899Updated last year
- ☆1,025Updated last year
- 🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization…☆2,812Updated 2 weeks ago
- Chat language model that can use tools and interpret the results☆1,532Updated this week
- Fine-tune mistral-7B on 3090s, a100s, h100s☆709Updated last year
- Large Language Model Text Generation Inference☆9,905Updated this week
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,487Updated last month
- OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset☆7,462Updated last year