janhq / cortex.llamacpp
cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server at runtime.
☆23Updated this week
Related projects ⓘ
Alternatives and complementary repositories for cortex.llamacpp
- Minimalist stable-diffusion desktop application with only one executable file writen with golang ( No python ).☆18Updated last month
- In-browser LLM website generator☆29Updated this week
- Port of Suno AI's Bark in C/C++ for fast inference☆54Updated 7 months ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆40Updated last month
- Course Project for COMP4471 on RWKV☆16Updated 9 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆93Updated this week
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆28Updated 4 months ago
- Gradio Client in Rust.☆23Updated last month
- After my server ui improvements were successfully merged, consider this repo a playground for experimenting, tinkering and hacking around…☆56Updated 3 months ago
- This repo provides a simple Gradio UI to run Qwen2 VL 72B AWQ in venv and have both image and video inferencing work.☆21Updated last month
- A ggml (C++) re-implementation of tortoise-tts☆159Updated 3 months ago
- Experiments with BitNet inference on CPU☆50Updated 7 months ago
- Video+code lecture on building nanoGPT from scratch☆64Updated 5 months ago
- LLM inference in C/C++☆11Updated 3 months ago
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆21Updated 4 months ago
- Accepts a Hugging Face model URL, automatically downloads and quantizes it using Bits and Bytes.☆38Updated 8 months ago
- HTTP proxy for on-demand model loading with llama.cpp (or other OpenAI compatible backends)☆41Updated this week
- GGML implementation of BERT model with Python bindings and quantization.☆51Updated 9 months ago
- Train your own small bitnet model☆56Updated last month
- All the world is a play, we are but actors in it.☆47Updated 4 months ago
- GRDN.AI app for garden optimization☆69Updated 9 months ago
- Large Model Proxy is designed to make it easy to run multiple resource-heavy Large Models (LM) on the same machine with limited amount of…☆47Updated last month
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆55Updated last week
- Implements harmful/harmless refusal removal using pure HF Transformers☆25Updated 5 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆28Updated 2 weeks ago
- automatically quant GGUF models☆140Updated this week
- ☆18Updated 3 weeks ago
- A collection of notebooks for the Hugging Face blog series (https://huggingface.co/blog).☆43Updated 3 months ago
- ☆33Updated last week
- Gradio based tool to run opensource LLM models directly from Huggingface☆87Updated 4 months ago