ngxson / ggml-easyLinks

Thin wrapper around GGML to make life easier

☆40

Alternatives and similar repositories for ggml-easy

Users that are interested in ggml-easy are comparing it to the libraries listed below

Sorting:

abetlen / ggml-python
Python bindings for ggml
☆146Updated last year
justinchuby / onnx-safetensors
Use safetensors with ONNX 🤗
☆76Updated 2 months ago
iamlemec / bert.cpp
GGML implementation of BERT model with Python bindings and quantization.
☆58Updated last year
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆54Updated last year
mmwillet / TTS.cpp
TTS support with GGML
☆197Updated 2 months ago
adriancable / qwen3.c
Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.
☆148Updated 5 months ago
IST-DASLab / gptq-gguf-toolkit
Efficient non-uniform quantization with GPTQ for GGUF
☆53Updated 2 months ago
nivibilla / build-nanogpt
Video+code lecture on building nanoGPT from scratch
☆68Updated last year
google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆200Updated 2 months ago
huggingface / optimum-onnx
🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime
☆95Updated last week
reka-ai / rekaquant
☆62Updated 4 months ago
facebookresearch / fastgen
Simple high-throughput inference library
☆150Updated 6 months ago
kroggen / mamba.c
Inference of Mamba models in pure C
☆194Updated last year
BlinkDL / fast.c
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆73Updated 10 months ago
fishiatee / yawullm
Yet Another (LLM) Web UI, made with Gemini
☆12Updated 11 months ago
fal-ai / flashpack
High-throughput tensor loading for PyTorch
☆209Updated this week
lukasVierling / FaceRWKV
Course Project for COMP4471 on RWKV
☆17Updated last year
cahya-wirawan / rwkv-tokenizer
A fast RWKV Tokenizer written in Rust
☆54Updated 3 months ago
deepgrove-ai / Bonsai
☆34Updated 8 months ago
Codys12 / airllm
AirLLM 70B inference with single 4GB GPU
☆14Updated 5 months ago
janhq / cortex.llamacpp
cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…
☆41Updated 5 months ago
balisujohn / tortoise.cpp
A ggml (C++) re-implementation of tortoise-tts
☆191Updated last year
ml-explore / mlx-c
C API for MLX
☆154Updated last week
jukofyork / transplant-vocab
Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.
☆47Updated last month
nath1295 / MLX-Textgen
A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.
☆99Updated 5 months ago
huggingface / kernel-builder
👷 Build compute kernels
☆192Updated this week
xenova / model-explorer
Browse, search, and visualize ONNX models.
☆34Updated 7 months ago
Cornell-RelaxML / yaqa-quantization
☆64Updated 5 months ago
kyutai-labs / dactory
☆43Updated last month
RobinQu / instinct.cpp
instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…
☆54Updated last year