janhq / cortex.llamacpp

cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server at runtime.

☆36

Alternatives and similar repositories for cortex.llamacpp:

Users that are interested in cortex.llamacpp are comparing it to the libraries listed below

FishiaT / yawullm
Yet Another (LLM) Web UI, made with Gemini
☆11Updated 2 months ago
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆53Updated 11 months ago
MaggotHATE / Llama_chat
A chat UI for Llama.cpp
☆12Updated 3 weeks ago
Maximilian-Winter / llama_cpp_function_calling
☆31Updated last year
iamlemec / bert.cpp
GGML implementation of BERT model with Python bindings and quantization.
☆54Updated last year
janhq / cortex.tensorrt-llm
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…
☆42Updated 5 months ago
ggerganov / bark.cpp
Port of Suno AI's Bark in C/C++ for fast inference
☆53Updated 10 months ago
Codys12 / airllm
AirLLM 70B inference with single 4GB GPU
☆12Updated 7 months ago
mounta11n / plusplus-camall
After my server ui improvements were successfully merged, consider this repo a playground for experimenting, tinkering and hacking around…
☆56Updated 6 months ago
mmwillet / TTS.cpp
TTS support with GGML
☆19Updated 2 weeks ago
pranavjad / tinyllama-bitnet
Train your own small bitnet model
☆65Updated 4 months ago
astramind-ai / Pulsar
The hearth of The Pulsar App, fast, secure and shared inference with modern UI
☆56Updated 3 months ago
ahoylabs / gguf.js
A Javascript library (with Typescript types) to parse metadata of GGML based GGUF files.
☆46Updated 7 months ago
LAION-AI / Desktop_BUD-E
BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…
☆31Updated 7 months ago
google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆123Updated this week
nomic-ai / llama.cpp
llama.cpp fork used by GPT4All
☆52Updated 2 weeks ago
monk1337 / auto-ollama
run ollama & gguf easily with a single command
☆49Updated 9 months ago
MichaelMcCulloch / WikiDex
☆30Updated 9 months ago
CharlesMod / quantizeHFmodel
Accepts a Hugging Face model URL, automatically downloads and quantizes it using Bits and Bytes.
☆38Updated 11 months ago
fairydreaming / farel-bench
Testing LLM reasoning abilities with family relationship quizzes.
☆60Updated last month
FFengIll / embedding.cpp
ggml implementation of BERT Embedding
☆25Updated last year
lukasVierling / FaceRWKV
Course Project for COMP4471 on RWKV
☆17Updated last year
brittlewis12 / autogguf
Easily convert HuggingFace models to GGUF-format for llama.cpp
☆21Updated 7 months ago
silphendio / sliced_llama
Simple LLM inference server
☆20Updated 8 months ago
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆53Updated 2 weeks ago
EdwardDali / EntropixLab
entropix style sampling + GUI
☆25Updated 4 months ago
mateogon / pdf-narrator
Convert your PDFs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and efficient…
☆49Updated this week