menloresearch / cortex.llamacppLinks
cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server at runtime.
☆41Updated 3 months ago
Alternatives and similar repositories for cortex.llamacpp
Users that are interested in cortex.llamacpp are comparing it to the libraries listed below
Sorting:
- Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.☆17Updated 2 months ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated last year
- ☆24Updated 9 months ago
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 10 months ago
- Running Microsoft's BitNet via Electron, React & Astro☆45Updated last month
- Thin wrapper around GGML to make life easier☆40Updated 4 months ago
- AirLLM 70B inference with single 4GB GPU☆14Updated 4 months ago
- Course Project for COMP4471 on RWKV☆17Updated last year
- Port of Suno AI's Bark in C/C++ for fast inference☆52Updated last year
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆53Updated last year
- TTS support with GGML☆184Updated 3 weeks ago
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated 2 years ago
- A chat UI for Llama.cpp☆15Updated last week
- SPLAA is an AI assistant framework that utilizes voice recognition, text-to-speech, and tool-calling capabilities to provide a conversati…☆28Updated 5 months ago
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆23Updated last year
- Spotlight-like client for Ollama on Windows.☆28Updated last year
- Service for testing out the new Qwen2.5 omni model☆61Updated 6 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆81Updated this week
- ☆104Updated 2 months ago
- A live multiplayer trivia game where users can bid for the subject of the next question☆28Updated 6 months ago
- Experiments with BitNet inference on CPU☆54Updated last year
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆29Updated 7 months ago
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆58Updated 11 months ago
- A ggml (C++) re-implementation of tortoise-tts☆190Updated last year
- A random walk voice style cloning application for Kokoro text to speech☆158Updated 4 months ago
- LLM Ripper is a framework for component extraction (embeddings, attention heads, FFNs), activation capture, functional analysis, and adap…☆45Updated this week
- ☆23Updated last year
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆32Updated last week
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆43Updated this week
- Port of Facebook's LLaMA model in C/C++☆21Updated last year