xaskasdf/ntransformer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xaskasdf/ntransformer)

xaskasdf / ntransformer

High-efficiency LLM inference engine in C++/CUDA. Run Llama 70B on RTX 3090.

☆464

Alternatives and similar repositories for ntransformer

Users that are interested in ntransformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TrevorS / voxtral-mini-realtime-rs
View on GitHub
Voxtral ASR & TTS running natively and in the browser. A Rust implementation of Mistral's Voxtral mini realtime ASR / TTS using the Burn …
☆811Apr 2, 2026Updated 3 months ago
kressler / fast-containers
View on GitHub
Performance focused header-only container library. Currently primarily contains a fast B+Tree implementation.
☆74Jan 6, 2026Updated 6 months ago
robertcprice / nCPU
View on GitHub
nCPU: model-native and tensor-optimized CPU research runtimes with organized workloads, tools, and docs
☆655Jul 11, 2026Updated 2 weeks ago
nakagami / grdpwasm
View on GitHub
A web-based RDP client
☆314Jul 4, 2026Updated 3 weeks ago
tech4bot / rk3562deb
View on GitHub
☆391Jun 21, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
c0deJedi / nbd-vram
View on GitHub
Use your NVIDIA GPU's VRAM as swap space on Linux. Built for laptops with soldered memory and no upgrade path. If you have an RTX card si…
☆513Jul 17, 2026Updated last week
cenconq25 / delta-compress-llm
View on GitHub
Proof of concept: Exploiting temporal coherence in LLM inference-- delta encoding for KV cache compression and weight-skip prediction. …
☆50Apr 10, 2026Updated 3 months ago
Zaneham / Booth
View on GitHub
Open-source CUDA, Triton and HIP compiler targeting multiple GPU and CPU architectures.
☆1,721Updated this week
swellweb / reame
View on GitHub
CPU-first LLM inference server on llama.cpp. Runs useful models on free-tier ARM boxes; rewriting the input made it ~6x faster and more a…
☆102Updated this week
kossisoroyce / timber
View on GitHub
Ollama for classical ML models. AOT compiler that turns XGBoost, LightGBM, scikit-learn, CatBoost & ONNX models into native C99 inference…
☆688Apr 16, 2026Updated 3 months ago
Alanma23 / tinytinyTPU-co
View on GitHub
☆173Jan 4, 2026Updated 6 months ago
localgpt-app / localgpt
View on GitHub
Local AI assistant, dreaming explorable worlds.
☆1,108Jun 21, 2026Updated last month
sueszli / autograd.c
View on GitHub
tiny torch, but close to metal
☆130Jun 25, 2026Updated last month
matthewrennie / go-llama.cpp
View on GitHub
Go bindings for LLama.cpp
☆14Apr 11, 2023Updated 3 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
blacksky-algorithms / atproto
View on GitHub
Blacksky fork of bluesky-social/atproto with AppView performance optimizations, caching, and community features
☆94Updated this week
salmanmohammadi / nanocode
View on GitHub
The best Claude Code that $200 can buy
☆269Apr 6, 2026Updated 3 months ago
jrswab / axe
View on GitHub
A lightweight cli for running single-purpose AI agents. Define focused agents in TOML, trigger them from anywhere; pipes, git hooks, cron…
☆832Jul 3, 2026Updated 3 weeks ago
alternbits / awesome-cuda-books
View on GitHub
A curated list of best cuda programming books
☆941May 19, 2026Updated 2 months ago
danveloper / flash-moe
View on GitHub
Running a big model on a small laptop
☆3,994Mar 19, 2026Updated 4 months ago
bwasti / gt
View on GitHub
[experimental] multiplexed distributed tensor framework
☆22Nov 17, 2025Updated 8 months ago
halfwhey / claudraband
View on GitHub
Claude Code for the Power User
☆282Apr 18, 2026Updated 3 months ago
openpcc / openpcc
View on GitHub
An open-source framework for verifiably private AI inference
☆944Jan 8, 2026Updated 6 months ago
ashtonsix / perf-portfolio
View on GitHub
HPC research and demonstrations
☆115Dec 17, 2025Updated 7 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
SwaggasDeCatas / emuThreeDS
View on GitHub
World's first Nintendo 3DS emulator for Apple devices based on Citra.
☆18Apr 7, 2023Updated 3 years ago
thatipamula-jashwanth / smart-knn
View on GitHub
smartKNN - A feature-weighted KNN algorithm with automatic preprocessing, normalization, and learned feature importance.
☆33Mar 11, 2026Updated 4 months ago
tnm / zclaw
View on GitHub
Your personal AI assistant at all-in 888KiB (~35KB in app code). Running on an ESP32. GPIO, cron, custom tools, memory, and more.
☆2,199May 17, 2026Updated 2 months ago
samim23 / vortexnet
View on GitHub
VortexNet: Neural Computing through Fluid Dynamics
☆50Jan 19, 2025Updated last year
antirez / iris.c
View on GitHub
Flux 2 image generation model pure C inference
☆1,967Feb 13, 2026Updated 5 months ago
alainnothere / llm-circuit-finder
View on GitHub
I replicated Ng's RYS method and found that duplicating 3 specific layers in Qwen2.5-32B boosts reasoning by 17% and duplicating layers 1…
☆242Mar 20, 2026Updated 4 months ago
samuel-vitorino / sopro
View on GitHub
A lightweight text-to-speech model with zero-shot voice cloning
☆877Feb 6, 2026Updated 5 months ago
DebarghaG / proofofthought
View on GitHub
Proof of thought : LLM-based reasoning using Z3 theorem proving with multiple backend support (SMT2 and JSON DSL)
☆375Jul 7, 2026Updated 3 weeks ago
tdortman / Cuckoo-GPU
View on GitHub
High-Performance GPU Cuckoo Filter
☆39Jul 21, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Frikallo / parakeet.cpp
View on GitHub
Ultra fast and portable Parakeet implementation for on-device inference in C++ using Axiom with MPS+Unified Memory
☆300May 4, 2026Updated 2 months ago
RedGridTactical / RedGridLink
View on GitHub
Offline MGRS navigation + BLE proximity team sync for 2-8 people. No cell service needed. V1.0 through V4.0 roadmap in README.
☆96Jun 12, 2026Updated last month
obround / mytorch
View on GitHub
Automatic differentiation implemented in python, inspired by Pytorch (easily extensible)
☆87Feb 21, 2023Updated 3 years ago
ad-si / Woxi
View on GitHub
Wolfram Language / Mathematica reimplementation in Rust (Wolfram oxidized)
☆662Updated this week
HarryR / z80ai
View on GitHub
Z80-μLM is a 2-bit quantized language model small enough to run on an 8-bit Z80 processor. Train conversational models in Python, export …
☆1,117Apr 29, 2026Updated 3 months ago
mohebifar / tooscut
View on GitHub
Professional video editing, right in your browser. Made with Rust, WebGPU, WASM, and Tanstack Start.
☆693Updated this week
martianlantern / ThinkMesh
View on GitHub
This is a framework that implements various parallel reasoning strategies from the literature
☆275Dec 18, 2025Updated 7 months ago