abetlen / ggml-pythonLinks

Python bindings for ggml

☆146

Alternatives and similar repositories for ggml-python

Users that are interested in ggml-python are comparing it to the libraries listed below

Sorting:

kroggen / mamba.c
Inference of Mamba models in pure C
☆192Updated last year
BlinkDL / nanoRWKV
RWKV in nanoGPT style
☆193Updated last year
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated last year
ggml-org / p1
LLM-based code completion engine
☆190Updated 9 months ago
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year
thomasantony / llamacpp-python
Python bindings for llama.cpp
☆198Updated 2 years ago
monatis / clip.cpp
CLIP inference in plain C/C++ with no extra dependencies
☆523Updated 4 months ago
NolanoOrg / cformers
SoTA Transformers with C-backend for fast inference on your CPU.
☆308Updated last year
trzy / llava-cpp-server
LLaVA server (llama.cpp).
☆183Updated 2 years ago
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆295Updated last year
rmihaylov / mpttune
Tune MPTs
☆84Updated 2 years ago
sdan / selfextend
an implementation of Self-Extend, to expand the context window via grouped attention
☆118Updated last year
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated last year
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆54Updated last year
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆120Updated 9 months ago
GreenBitAI / low_bit_llama
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
☆110Updated last year
euclaise / SlimTrainer
Full finetuning of large language models without large memory requirements
☆93Updated last month
chu-tianxiang / llama-cpp-torch
llama.cpp to PyTorch Converter
☆34Updated last year
harrisonvanderbyl / rwkv-cpp-accelerated
A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…
☆313Updated last year
wozeparrot / tinyrwkv
tinygrad port of the RWKV large language model.
☆44Updated 7 months ago
euclaise / supertrainer2000
☆50Updated last year
facebookresearch / fastgen
Simple high-throughput inference library
☆147Updated 5 months ago
philipturner / metal-flash-attention
FlashAttention (Metal Port)
☆545Updated last year
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆125Updated 2 years ago
BlinkDL / fast.c
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆73Updated 8 months ago
PABannier / biogpt.cpp
Port of Microsoft's BioGPT in C/C++ using ggml
☆85Updated last year
rafacelente / bllama
1.58-bit LLaMa model
☆83Updated last year
Cornell-RelaxML / quip-sharp
☆561Updated 11 months ago
pranavjad / tinyllama-bitnet
Train your own small bitnet model
☆75Updated last year
99991 / pygguf
GGUF parser in Python
☆28Updated last year