gruai / koifishLinks
A c++ framework on efficient training & fine-tuning LLMs
☆25Updated last week
Alternatives and similar repositories for koifish
Users that are interested in koifish are comparing it to the libraries listed below
Sorting:
- Load and run Llama from safetensors files in C☆14Updated last year
- Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.☆19Updated 3 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆47Updated last month
- A little(lil) Language Model (LM). A tiny reproduction of LLaMA 3's model architecture.☆53Updated 7 months ago
- ☆88Updated 2 weeks ago
- Train your own small bitnet model☆76Updated last year
- Thin wrapper around GGML to make life easier☆40Updated last month
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆151Updated 2 weeks ago
- Experiments with BitNet inference on CPU☆54Updated last year
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆150Updated 5 months ago
- ☆62Updated 5 months ago
- Course Project for COMP4471 on RWKV☆17Updated last year
- Inference RWKV v7 in pure C.☆42Updated 2 months ago
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆28Updated 4 months ago
- ☆38Updated 2 months ago
- cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…☆41Updated 5 months ago
- 1.58-bit LLaMa model☆83Updated last year
- RWKV centralised docs for the community☆29Updated 4 months ago
- A fast RWKV Tokenizer written in Rust☆54Updated 4 months ago
- ☆16Updated 5 months ago
- TTS support with GGML☆201Updated 2 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆85Updated last week
- Stable Diffusion and Flux in pure C/C++☆24Updated this week
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 10 months ago
- An unsupervised model merging algorithm for Transformers-based language models.☆108Updated last year
- Inference of Mamba models in pure C☆195Updated last year
- Inference Llama/Llama2/Llama3 Modes in NumPy☆21Updated 2 years ago
- Video+code lecture on building nanoGPT from scratch☆68Updated last year
- A ggml (C++) re-implementation of tortoise-tts☆193Updated last year
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 11 months ago