ggerganov / ggml

Tensor library for machine learning

☆11,541

Alternatives and similar repositories for ggml:

Users that are interested in ggml are comparing it to the libraries listed below

abetlen / llama-cpp-python
Python bindings for llama.cpp
☆8,420Updated last week
ggerganov / llama.cpp
LLM inference in C/C++
☆70,826Updated this week
karpathy / llama2.c
Inference Llama 2 in one file of pure C
☆17,858Updated 5 months ago
huggingface / text-generation-inference
Large Language Model Text Generation Inference
☆9,592Updated this week
vllm-project / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆33,809Updated this week
artidoro / qlora
QLoRA: Efficient Finetuning of Quantized LLMs
☆10,168Updated 7 months ago
bigscience-workshop / petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
☆9,349Updated 4 months ago
mlc-ai / mlc-llm
Universal LLM Deployment Engine with ML Compilation
☆19,630Updated this week
BlinkDL / RWKV-LM
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable)…
☆13,005Updated last week
openlm-research / open_llama
OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
☆7,417Updated last year
nlpxucan / WizardLM
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
☆9,312Updated 5 months ago
bitsandbytes-foundation / bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
☆6,522Updated this week
Lightning-AI / lit-llama
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Ad…
☆6,021Updated 4 months ago
antimatter15 / alpaca.cpp
Locally run an Instruction-Tuned Chat-Style LLM
☆10,240Updated last year
FMInference / FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
☆9,254Updated 2 months ago
mit-han-lab / streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
☆6,749Updated 6 months ago
tloen / alpaca-lora
Instruct-tune LLaMA on consumer hardware
☆18,758Updated 5 months ago
sgl-project / sglang
SGLang is a fast serving framework for large language models and vision language models.
☆7,353Updated this week
huggingface / peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
☆16,978Updated this week
openai / tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
☆13,023Updated 3 months ago
dottxt-ai / outlines
Structured Text Generation
☆10,350Updated this week
jzhang38 / TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
☆8,113Updated 8 months ago
AutoGPTQ / AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
☆4,620Updated this week
NVIDIA / TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain…
☆9,147Updated this week
mistralai / mistral-inference
Official inference library for Mistral models
☆9,857Updated 2 months ago
rustformers / llm
[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models
☆6,088Updated 6 months ago
skypilot-org / skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability v…
☆7,048Updated this week
Dao-AILab / flash-attention
Fast and memory-efficient exact attention
☆15,064Updated this week
turboderp-org / exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
☆3,845Updated last week
chroma-core / chroma
the AI-native open-source embedding database
☆17,023Updated this week