GGUF implementation in C as a library and a tools CLI program
☆310Aug 28, 2025Updated 6 months ago
Alternatives and similar repositories for gguf-tools
Users that are interested in gguf-tools are comparing it to the libraries listed below
Sorting:
- ggml implementation of BERT☆498Feb 23, 2024Updated 2 years ago
- A small utility library for parsing GGUF file info☆28Jan 27, 2025Updated last year
- First token cutoff sampling inference example☆30Jan 15, 2024Updated 2 years ago
- Port of Meta's Encodec in C/C++☆228Dec 4, 2024Updated last year
- CLIP inference in plain C/C++ with no extra dependencies☆552Jun 19, 2025Updated 8 months ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆308Apr 11, 2024Updated last year
- Suno AI's Bark model in C/C++ for fast text-to-speech generation☆857Nov 16, 2024Updated last year
- Tensor library for machine learning☆14,152Updated this week
- Local ML voice chat using high-end models.☆184Dec 13, 2025Updated 2 months ago
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,562Mar 23, 2025Updated 11 months ago
- ☆130Nov 9, 2024Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆57Feb 19, 2024Updated 2 years ago
- Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++☆5,490Updated this week
- Yet Another (LLM) Web UI, made with Gemini☆12Dec 25, 2024Updated last year
- GGUF parser for Go☆14Dec 28, 2023Updated 2 years ago
- A chat UI for Llama.cpp☆15Dec 2, 2025Updated 3 months ago
- lightweight, standalone C++ inference engine for Google's Gemma models.☆6,742Updated this week
- ☆129Jan 22, 2024Updated 2 years ago
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inference☆1,003Dec 17, 2025Updated 2 months ago
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Jul 26, 2023Updated 2 years ago
- iterate quickly with llama.cpp hot reloading. use the llama.cpp bindings with bun.sh☆50Oct 30, 2023Updated 2 years ago
- Inference Llama 2 in one file of pure C☆19,213Aug 6, 2024Updated last year
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆821Feb 23, 2026Updated last week
- A Javascript library (with Typescript types) to parse metadata of GGML based GGUF files.☆51Jul 30, 2024Updated last year
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- Implementation of ModernBERT in MLX☆20Jan 7, 2026Updated last month
- Stable Diffusion in pure C/C++☆14May 29, 2024Updated last year
- C++ implementation for BLOOM☆809May 13, 2023Updated 2 years ago
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,882Jan 28, 2024Updated 2 years ago
- Port of Andrej Karpathy's nanoGPT to Apple MLX framework.☆117Feb 12, 2024Updated 2 years ago
- Perf monitoring CLI tool for Apple Silicon☆16Jan 1, 2024Updated 2 years ago
- HC-256 Stream cipher in x86 assembly☆19Nov 14, 2017Updated 8 years ago
- Using modal.com to process FineWeb-edu data☆20Apr 5, 2025Updated 11 months ago
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,859May 17, 2025Updated 9 months ago
- Distribute and run LLMs with a single file.☆23,755Updated this week
- GGUF parser in Python☆28Aug 15, 2024Updated last year
- Natural language control for Python CLI tools using locally-trained SLMs (CPU inference)☆30Feb 21, 2026Updated last week
- Open-source LLM load balancer and serving platform for self-hosting LLMs at scale 🏓🦙 Alternative to projects like llm-d, Docker Model R…☆1,467Feb 25, 2026Updated last week
- A framework for orchestrating AI agents using a mermaid graph☆76May 16, 2024Updated last year