menloresearch / cortex.llamacppLinks

cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server at runtime.

☆42

Alternatives and similar repositories for cortex.llamacpp

Users that are interested in cortex.llamacpp are comparing it to the libraries listed below

Sorting:

menloresearch / cortex.tensorrt-llm
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…
☆42Updated last year
RobinQu / instinct.cpp
instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…
☆52Updated last year
mmwillet / TTS.cpp
TTS support with GGML
☆180Updated this week
FishiaTee / yawullm
Yet Another (LLM) Web UI, made with Gemini
☆12Updated 9 months ago
grctest / Electron-BitNet
Running Microsoft's BitNet via Electron, React & Astro
☆44Updated 2 weeks ago
gigit0000 / qwen3.c
Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.
☆18Updated last month
nomic-ai / llama.cpp
llama.cpp fork used by GPT4All
☆57Updated 7 months ago
chigkim / Ollama-MMLU-Pro
☆102Updated last month
Rivridis / LLM-Assistant
Locally running LLM with internet access
☆97Updated 3 months ago
balisujohn / tortoise.cpp
A ggml (C++) re-implementation of tortoise-tts
☆189Updated last year
taylorchu / kokoro-onnx
☆24Updated 8 months ago
MaggotHATE / Llama_chat
A chat UI for Llama.cpp
☆15Updated last month
pranavjad / tinyllama-bitnet
Train your own small bitnet model
☆75Updated 11 months ago
lukasVierling / FaceRWKV
Course Project for COMP4471 on RWKV
☆17Updated last year
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆54Updated last year
Codys12 / airllm
AirLLM 70B inference with single 4GB GPU
☆14Updated 3 months ago
monatis / lmm.cpp
Inference of Large Multimodal Models in C/C++. LLaVA and others
☆48Updated 2 years ago
qrv0 / llm-ripper
LLM Ripper is a framework for component extraction (embeddings, attention heads, FFNs), activation capture, functional analysis, and adap…
☆45Updated this week
ngxson / ggml-easy
Thin wrapper around GGML to make life easier
☆39Updated 3 months ago
ggerganov / bark.cpp
Port of Suno AI's Bark in C/C++ for fast inference
☆52Updated last year
jukofyork / transplant-vocab
Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.
☆42Updated last month
electroglyph / quant_clone
Generate a llama-quantize command to copy the quantization parameters of any GGUF
☆24Updated 2 months ago
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆80Updated last week
the-crypt-keeper / ggml-downloader
Simple, Fast, Parallel Huggingface GGML model downloader written in python
☆24Updated 2 years ago
cp3249 / splaa
SPLAA is an AI assistant framework that utilizes voice recognition, text-to-speech, and tool-calling capabilities to provide a conversati…
☆29Updated 5 months ago
huseinzol05 / transformers-continuous-batching
Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.
☆28Updated 6 months ago
beratcmn / local-intelligence
Something similar to Apple Intelligence?
☆61Updated last year
exo-explore / evML
Resources regarding evML (edge verified machine learning)
☆20Updated 9 months ago
FishiaTee / Tumera
Yet another frontend for LLM, written using .NET and WinUI 3
☆10Updated 3 weeks ago
not-nullptr / Spotllama
Spotlight-like client for Ollama on Windows.
☆28Updated last year