AtomicBot-ai/atomic-llama-cpp-turboquant

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AtomicBot-ai/atomic-llama-cpp-turboquant)

AtomicBot-ai / atomic-llama-cpp-turboquant

llama.cpp fork with TurboQuant WHT-rotated KV cache & weight compression + Gemma 4 MTP and Qwen 3.6 NextN speculative decoding (+30-50% throughput).

☆307

Alternatives and similar repositories for atomic-llama-cpp-turboquant

Users that are interested in atomic-llama-cpp-turboquant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TheTom / llama-cpp-turboquant
View on GitHub
LLM inference in C/C++
☆2,160Updated this week
Anbeeld / beellama.cpp
View on GitHub
KVarN, KV cache precision tail, low-bit quants in llama.cpp for longer context of better precision in the same VRAM
☆794Updated this week
Indras-Mirror / llama.cpp-turboq-mtp
View on GitHub
Fused TBQ4 Flash Attention + MTP + Shared Tensors for llama.cpp — 82+ tok/s with lossless 4.25 bpv KV cache at 200K context on RTX 4090
☆90May 17, 2026Updated 2 months ago
turbo-tan / llama.cpp-tq3
View on GitHub
llama.cpp fork with TQ3_1S/4S CUDA kernels — 3.5-bit WHT quantization achieving Q4s quality at 10% smaller size. Based on RaBitQ-inspired…
☆221Jul 6, 2026Updated 2 weeks ago
spiritbuun / buun-llama-cpp
View on GitHub
LLAMA Turboquant implementation with CUDA support
☆704Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ikawrakow / ik_llama.cpp
View on GitHub
llama.cpp fork with additional SOTA quants and improved performance
☆2,943Updated this week
AtomicBot-ai / Atomic-Chat
View on GitHub
Local AI app and inference engine for agents. Run open-weight LLMs locally — private, 100% offline on your computer.
☆1,128Updated this week
Luce-Org / lucebox
View on GitHub
Fast LLM speculative inference server for consumer hardware.
☆2,668Updated this week
AtomicBot-ai / clawhub-layer-api
View on GitHub
🐾 Complete REST API for ClawHub skills marketplace data
☆15Apr 6, 2026Updated 3 months ago
BoFan-tunning / llama.cpp-MTP-TurboQuant
View on GitHub
☆142Jun 13, 2026Updated last month
johndpope / llama-cpp-turboquant
View on GitHub
LLM inference in C/C++
☆64May 7, 2026Updated 2 months ago
AtomicBot-ai / atomicbot
View on GitHub
The Fastest Way to Run OpenClaw 🦞
☆325May 26, 2026Updated last month
scrya-com / rotorquant
View on GitHub
KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44…
☆1,037Apr 23, 2026Updated 2 months ago
test1111111111111112 / llama-cpp-turboquant-gemma4
View on GitHub
TurboQuant llama.cpp fork with optimized turbo4 kernels for Gemma 4 D=256/512 heads — lazy K/V, batch decode, warp-cooperative write. 120…
☆35Apr 5, 2026Updated 3 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Ai-Swat / sigma-eclipse-llm
View on GitHub
☆22Apr 3, 2026Updated 3 months ago
localai-org / apex-quant
View on GitHub
Adaptive Precision for EXpert Models: MoE-aware mixed-precision quantization
☆397May 29, 2026Updated last month
PrismML-Eng / llama.cpp
View on GitHub
LLM inference in C/C++
☆387Updated this week
am17an / llama.cpp
View on GitHub
LLM inference in C/C++
☆56Updated this week
z-lab / dflash
View on GitHub
DFlash: Block Diffusion for Flash Speculative Decoding
☆5,504May 10, 2026Updated 2 months ago
steveseguin / b70-optimization-lab
View on GitHub
☆22Jul 12, 2026Updated last week
Luce-Org / lucebox-ggml
View on GitHub
VENDORIZED in lucebox-hub. Fork of llama.cpp, ggml graph for lucebox inference engine
☆31Jul 8, 2026Updated last week
mostlygeek / llama-swap
View on GitHub
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
☆5,087Updated this week
BrutchsamaJeanLouis / llm-sampling-tuner
View on GitHub
Automated parameter sweep pipeline for finding optimal sampling settings for any local LLM on quantized weights
☆18Feb 21, 2026Updated 5 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
QuinsZouls / llama-cpp-turboquant
View on GitHub
Experimental LLM inference in C/C++
☆40May 15, 2026Updated 2 months ago
Thireus / GGUF-Tool-Suite
View on GitHub
Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input a target size and the toolchain w…
☆146Updated this week
CYBERLOOM-INC / ComfyUI-nodes-hnmr
View on GitHub
ComfyUI custom nodes - merge, grid (aka xyz-plot) and others
☆11May 22, 2024Updated 2 years ago
lilly1987 / ComfyUI-workflow
View on GitHub
☆14Apr 14, 2023Updated 3 years ago
RAZZULLIX / fast_topk_batched
View on GitHub
High-performance batched Top-K selection for CPU inference. Up to 80x faster than PyTorch, optimized for LLM sampling with AVX2 SIMD.
☆17Mar 20, 2026Updated 4 months ago
Kaden-Schutt / hipfire
View on GitHub
RDNA-native LLM inference engine in Rust.
☆486Updated this week
Doorman11991 / MarrowScript
View on GitHub
MarrowScript compiler. Welcome to deterministic typed LLM orchestration as a compile-time concern
☆32May 21, 2026Updated 2 months ago
ggml-org / llama.vscode
View on GitHub
VS Code extension for LLM-assisted code/text completion
☆1,451Updated this week
xizhilang-lab / My_3D_Nodes
View on GitHub
☆15Jan 11, 2026Updated 6 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Osmantic / MMBT-Messy-Model-Bench-Tests
View on GitHub
Messy repo filled with messy tests about hardware and LLMs. Built for me, public for you.
☆41Jun 1, 2026Updated last month
charlie12345 / ROCmFPX
View on GitHub
ROCmFPX Family for AMD Hardware and Processors. More quants and special agent quants
☆131Updated this week
Shohruh72 / FastVLM
View on GitHub
🚀 LLaVA-FastVLM: One-Click Visual Language API
☆15Jun 14, 2025Updated last year
TheTom / turboquant_plus
View on GitHub
☆6,997Updated this week
wiserautomation / SupraWall
View on GitHub
The open-source security layer for AI agents. Deterministic guardrails, PII redaction, and EU AI Act compliance in one line of code.
☆23May 12, 2026Updated 2 months ago
Madreag / turbo3-cuda
View on GitHub
LLM inference in C/C++
☆36Apr 12, 2026Updated 3 months ago
andthattoo / structured-cot
View on GitHub
Structured Chain-of-Thought
☆219May 16, 2026Updated 2 months ago