menloresearch / cortex.llamacppLinks
cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server at runtime.
☆42Updated last month
Alternatives and similar repositories for cortex.llamacpp
Users that are interested in cortex.llamacpp are comparing it to the libraries listed below
Sorting:
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆43Updated 10 months ago
- ☆22Updated 6 months ago
- AirLLM 70B inference with single 4GB GPU☆14Updated last month
- Thin wrapper around GGML to make life easier☆40Updated last month
- Lightweight C inference for Qwen3 GGUF with the smallest (0.6B) at the fullest (FP32)☆15Updated last week
- TTS support with GGML☆139Updated 2 weeks ago
- Running Microsoft's BitNet via Electron, React & Astro☆43Updated 2 months ago
- Course Project for COMP4471 on RWKV☆17Updated last year
- Port of Suno AI's Bark in C/C++ for fast inference☆52Updated last year
- ggml implementation of embedding models including SentenceTransformer and BGE☆58Updated last year
- A chat UI for Llama.cpp☆15Updated 3 weeks ago
- Experiments with BitNet inference on CPU☆54Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 7 months ago
- LLM inference in C/C++☆98Updated last week
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆67Updated last month
- llama.cpp fork used by GPT4All☆56Updated 5 months ago
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated 2 years ago
- A ggml (C++) re-implementation of tortoise-tts☆188Updated 11 months ago
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆36Updated last year
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆55Updated 8 months ago
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆53Updated last year
- Phi4 Multimodal Instruct - OpenAI endpoint and Docker Image for self-hosting☆39Updated 5 months ago
- Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…☆49Updated 2 months ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆26Updated 4 months ago
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆51Updated 5 months ago
- A fast RWKV Tokenizer written in Rust☆47Updated 3 weeks ago
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆22Updated last year
- Locally running LLM with internet access☆96Updated last month
- Controllable Language Model Interactions in TypeScript☆9Updated last year