janhq / cortex.llamacppLinks
cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server at runtime.
☆41Updated 4 months ago
Alternatives and similar repositories for cortex.llamacpp
Users that are interested in cortex.llamacpp are comparing it to the libraries listed below
Sorting:
- ☆24Updated 9 months ago
- TTS support with GGML☆188Updated last month
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆53Updated last year
- Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.☆17Updated 2 months ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated last year
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 10 months ago
- Experiments with BitNet inference on CPU☆54Updated last year
- Running Microsoft's BitNet via Electron, React & Astro☆46Updated last month
- A ggml (C++) re-implementation of tortoise-tts☆191Updated last year
- On-device streaming text-to-speech engine powered by deep learning☆122Updated 2 months ago
- Course Project for COMP4471 on RWKV☆17Updated last year
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated 2 years ago
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆58Updated 11 months ago
- A chat UI for Llama.cpp☆15Updated 2 weeks ago
- Locally running LLM with internet access☆97Updated 4 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- llama.cpp fork used by GPT4All☆57Updated 8 months ago
- Thin wrapper around GGML to make life easier☆40Updated 4 months ago
- ☆105Updated 2 months ago
- AirLLM 70B inference with single 4GB GPU☆14Updated 4 months ago
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Updated last year
- Train your own small bitnet model☆75Updated last year
- Inference of Large Multimodal Models in C/C++. LLaVA and others☆48Updated 2 years ago
- Spotlight-like client for Ollama on Windows.☆28Updated last year
- ☆20Updated last year
- Port of Suno AI's Bark in C/C++ for fast inference☆52Updated last year
- Light WebUI for lm.rs☆24Updated last year
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆52Updated 8 months ago
- ggml implementation of embedding models including SentenceTransformer and BGE☆62Updated last year
- LLM inference in C/C++☆103Updated last week