ngxson / ggml-easyLinks
Thin wrapper around GGML to make life easier
☆37Updated last month
Alternatives and similar repositories for ggml-easy
Users that are interested in ggml-easy are comparing it to the libraries listed below
Sorting:
- A minimalistic C++ Jinja templating engine for LLM chat templates☆163Updated 3 weeks ago
- Python bindings for ggml☆142Updated 11 months ago
- Inference of Mamba models in pure C☆189Updated last year
- Simple high-throughput inference library☆125Updated 2 months ago
- Course Project for COMP4471 on RWKV☆17Updated last year
- Video+code lecture on building nanoGPT from scratch☆69Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- Use safetensors with ONNX 🤗☆69Updated last month
- Experiments with BitNet inference on CPU☆54Updated last year
- Inference of Large Multimodal Models in C/C++. LLaVA and others☆47Updated last year
- Lightweight C inference for Qwen3 GGUF with the smallest (0.6B) at the fullest (FP32)☆15Updated last week
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆72Updated 6 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆39Updated 3 weeks ago
- ☆57Updated 3 weeks ago
- C API for MLX☆121Updated 3 weeks ago
- Train your own small bitnet model☆75Updated 9 months ago
- Browse, search, and visualize ONNX models.☆33Updated 3 months ago
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆41Updated last year
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 7 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆82Updated 2 months ago
- Inference RWKV v7 in pure C.☆37Updated 2 weeks ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆67Updated last month
- cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…☆42Updated last month
- fast state-of-the-art speech models and a runtime that runs anywhere 💥☆55Updated last month
- RWKV-7: Surpassing GPT☆94Updated 8 months ago
- asynchronous/distributed speculative evaluation for llama3☆39Updated last year
- llama.cpp to PyTorch Converter☆34Updated last year
- Benchmark your GPU with ease☆22Updated 2 months ago
- AirLLM 70B inference with single 4GB GPU☆14Updated last month
- ☆17Updated 8 months ago