janhq / cortex.llamacppLinks
cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server at runtime.
☆42Updated 6 months ago
Alternatives and similar repositories for cortex.llamacpp
Users that are interested in cortex.llamacpp are comparing it to the libraries listed below
Sorting:
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated last year
- Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.☆21Updated 5 months ago
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆57Updated last year
- ☆27Updated last year
- AirLLM 70B inference with single 4GB GPU☆17Updated 7 months ago
- Port of Suno AI's Bark in C/C++ for fast inference☆54Updated last year
- Yet Another (LLM) Web UI, made with Gemini☆12Updated last year
- TTS support with GGML☆218Updated 3 months ago
- Running Microsoft's BitNet via Electron, React & Astro☆51Updated 4 months ago
- Thin wrapper around GGML to make life easier☆42Updated 2 months ago
- Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp☆170Updated 9 months ago
- Locally running LLM with internet access☆97Updated 7 months ago
- ☆109Updated 5 months ago
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆59Updated last year
- llama.cpp fork used by GPT4All☆55Updated 11 months ago
- Experiments with BitNet inference on CPU☆55Updated last year
- SPLAA is an AI assistant framework that utilizes voice recognition, text-to-speech, and tool-calling capabilities to provide a conversati…☆28Updated 8 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆48Updated 3 months ago
- A ggml (C++) re-implementation of tortoise-tts☆193Updated last year
- A chat UI for Llama.cpp☆15Updated last month
- On-device streaming text-to-speech engine powered by deep learning☆127Updated last week
- A set of tools to create synthetically-generated data from documents☆39Updated 5 months ago
- Use safetensors with ONNX 🤗☆83Updated 2 weeks ago
- Something similar to Apple Intelligence?☆59Updated last year
- A memory framework for Large Language Models and Agents.☆181Updated last year
- Who needs o1 anyways. Add CoT to any OpenAI compatible endpoint.☆44Updated last year
- ☆29Updated 9 months ago
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆51Updated 11 months ago
- ggml implementation of embedding models including SentenceTransformer and BGE☆63Updated 2 years ago
- Course Project for COMP4471 on RWKV☆17Updated last year