RobinQu / instinct.cppLinks

instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG, Chatbot, Code interpreter) powered by language models. Call it langchain.cpp if you like.

☆57

Alternatives and similar repositories for instinct.cpp

Users that are interested in instinct.cpp are comparing it to the libraries listed below

Sorting:

iamlemec / bert.cpp
GGML implementation of BERT model with Python bindings and quantization.
☆58Updated last year
xyzhang626 / embeddings.cpp
ggml implementation of embedding models including SentenceTransformer and BGE
☆63Updated 2 years ago
nomic-ai / kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …
☆51Updated 11 months ago
uukuguy / speechless
LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.
☆107Updated 6 months ago
janhq / cortex.llamacpp
cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…
☆42Updated 7 months ago
kroggen / mamba.c
Inference of Mamba and Mamba2 models in pure C
☆196Updated 2 weeks ago
FFengIll / embedding.cpp
ggml implementation of BERT Embedding
☆26Updated 2 years ago
monatis / lmm.cpp
Inference of Large Multimodal Models in C/C++. LLaVA and others
☆48Updated 2 years ago
gigit0000 / qwen3.c
Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.
☆21Updated 5 months ago
wangkuiyi / huggingface-tokenizer-in-cxx
☆70Updated 2 years ago
unslothai / llama.cpp
LLM inference in C/C++
☆104Updated last week
abetlen / ggml-python
Python bindings for ggml
☆147Updated last year
nyunAI / PruneGPT
☆51Updated last year
shivamsanju / ragswift
🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform
☆38Updated 2 years ago
unit-mesh / edge-infer
EdgeInfer enables efficient edge intelligence by running small AI models, including embeddings and OnnxModels, on resource-constrained de…
☆50Updated last year
bentoml / BentoVLLM
Self-host LLMs with vLLM and BentoML
☆168Updated 2 weeks ago
Codys12 / airllm
AirLLM 70B inference with single 4GB GPU
☆17Updated 7 months ago
Cerebras / DocChat
GPT-4 Level Conversational QA Trained In a Few Hours
☆65Updated last year
ngxson / ggml-easy
Thin wrapper around GGML to make life easier
☆42Updated 3 months ago
conanhujinming / text_dedup
High-Performance Text Deduplication Toolkit
☆61Updated 5 months ago
google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆202Updated 4 months ago
leoguillaume / vllm-embedding
Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.
☆44Updated last year
rag-wtf / open-text-embeddings
Open Source Text Embedding Models with OpenAI Compatible API
☆167Updated last year
trzy / llava-cpp-server
LLaVA server (llama.cpp).
☆183Updated 2 years ago
MaggotHATE / Llama_chat
A chat UI for Llama.cpp
☆15Updated 2 months ago
chisasaw / redcache-ai
A memory framework for Large Language Models and Agents.
☆181Updated last year
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆94Updated 11 months ago
janhq / cortex.tensorrt-llm
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…
☆42Updated last year
ptsochantaris / emeltal
Local ML voice chat using high-end models.
☆182Updated last month
PABannier / biogpt.cpp
Port of Microsoft's BioGPT in C/C++ using ggml
☆85Updated last year