menloresearch / cortex.llamacpp
cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server at runtime.
☆40Updated this week
Alternatives and similar repositories for cortex.llamacpp
Users that are interested in cortex.llamacpp are comparing it to the libraries listed below
Sorting:
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆43Updated 7 months ago
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 4 months ago
- ☆31Updated last year
- Experiments with BitNet inference on CPU☆54Updated last year
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆56Updated 5 months ago
- AirLLM 70B inference with single 4GB GPU☆12Updated 9 months ago
- A Python library to orchestrate LLMs in a neural network-inspired structure☆47Updated 7 months ago
- ☆19Updated 3 months ago
- TTS support with GGML☆35Updated this week
- A chat UI for Llama.cpp☆13Updated last week
- Something similar to Apple Intelligence?☆60Updated 10 months ago
- Locally running LLM with internet access☆94Updated last month
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆25Updated last week
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆48Updated 10 months ago
- Course Project for COMP4471 on RWKV☆17Updated last year
- llama.cpp fork used by GPT4All☆55Updated 2 months ago
- Thin wrapper around GGML to make life easier☆29Updated this week
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆61Updated this week
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆15Updated 8 months ago
- ☆66Updated 11 months ago
- LLM inference in C/C++☆77Updated last week
- Running Microsoft's BitNet via Electron, React & Astro☆38Updated 3 weeks ago
- ☆24Updated 3 months ago
- Serving LLMs in the HF-Transformers format via a PyFlask API☆71Updated 8 months ago
- LLM inference in C/C++☆21Updated last month
- ☆14Updated 8 months ago
- Local LLM inference & management server with built-in OpenAI API☆31Updated last year
- An API for VoiceCraft.☆25Updated 10 months ago
- fast state-of-the-art speech models and a runtime that runs anywhere 💥☆55Updated 3 months ago
- ☆19Updated last month