antirez / gguf-tools
GGUF implementation in C as a library and a tools CLI program
β251Updated 3 weeks ago
Alternatives and similar repositories for gguf-tools:
Users that are interested in gguf-tools are comparing it to the libraries listed below
- Inference of Mamba models in pure Cβ183Updated 11 months ago
- An implementation of bucketMul LLM inferenceβ215Updated 6 months ago
- Stateful load balancer custom-tailored for llama.cpp ππ¦β676Updated last week
- throwaway GPT inferenceβ140Updated 7 months ago
- Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)β562Updated last year
- LLaVA server (llama.cpp).β176Updated last year
- Python bindings for ggmlβ136Updated 4 months ago
- llama.cpp fork with additional SOTA quants and improved performanceβ133Updated this week
- WebGPU LLM inference tuned by handβ148Updated last year
- A minimalistic C++ Jinja templating engine for LLM chat templatesβ104Updated this week
- LLM-based code completion engineβ178Updated last week
- GGML implementation of BERT model with Python bindings and quantization.β53Updated 11 months ago
- Extend the original llama.cpp repo to support redpajama model.β117Updated 4 months ago
- ggml implementation of BERTβ476Updated 11 months ago
- CLIP inference in plain C/C++ with no extra dependenciesβ475Updated 5 months ago
- FlashAttention (Metal Port)β430Updated 4 months ago
- an implementation of Self-Extend, to expand the context window via grouped attentionβ118Updated last year
- Run GGML models with Kubernetes.β173Updated last year
- Mistral7B playing DOOMβ127Updated 6 months ago
- Fast parallel LLM inference for MLXβ153Updated 6 months ago
- Tiny Dream - An embedded, Header Only, Stable Diffusion C++ implementationβ257Updated last year
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct modeβ¦β113Updated 6 months ago
- β192Updated last week
- TypeScript generator for llama.cpp Grammar directly from TypeScript interfacesβ131Updated 6 months ago
- C API for MLXβ91Updated this week
- LLM-powered lossless compression toolβ263Updated 5 months ago
- SoTA Transformers with C-backend for fast inference on your CPU.β312Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"β154Updated 3 months ago
- Full finetuning of large language models without large memory requirementsβ93Updated last year
- inference code for mixtral-8x7b-32kseqlenβ99Updated last year