gpustack / llama-boxLinks

LM inference server implementation based on *.cpp.

☆248

Alternatives and similar repositories for llama-box

Users that are interested in llama-box are comparing it to the libraries listed below

Sorting:

gpustack / gguf-parser-go
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
☆189Updated 2 weeks ago
gpustack / vox-box
A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.
☆146Updated 3 weeks ago
ubergarm / r1-ktransformers-guide
run DeepSeek-R1 GGUFs on KTransformers
☆246Updated 5 months ago
horus-ai-labs / DistillFlow
Library for model distillation
☆148Updated 5 months ago
foldl / chatllm.cpp
Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)
☆668Updated this week
and270 / thinking_effort_processor
☆91Updated last month
xorbitsai / xllamacpp
xllamacpp - a Python wrapper of llama.cpp
☆48Updated last week
leafspark / AutoGGUF
automatically quant GGUF models
☆190Updated this week
thad0ctor / llama-server-launcher
☆104Updated this week
unslothai / llama.cpp
LLM inference in C/C++
☆98Updated last week
rag-wtf / open-text-embeddings
Open Source Text Embedding Models with OpenAI Compatible API
☆157Updated last year
KylinMountain / markify
Convert files into markdown to help RAG or LLM understand, based on markitdown and MinerU, which could provide high quality pdf parser.
☆121Updated 4 months ago
HimariO / llama.cpp.qwen2.5vl
Port of Facebook's LLaMA model in C/C++
☆54Updated 3 months ago
Yoosu-L / llmapibenchmark
The LLM API Benchmark Tool is a flexible Go-based utility designed to measure and analyze the performance of OpenAI-compatible API endpoi…
☆36Updated 5 months ago
ParisNeo / ollama_proxy_server
A proxy server for multiple ollama instances with Key security
☆470Updated last week
anastasiosyal / phi4-multimodal-instruct-server
Phi4 Multimodal Instruct - OpenAI endpoint and Docker Image for self-hosting
☆39Updated 5 months ago
wade1010 / graphrag-ui
The latest graphrag interface is used, using the local ollama to provide the LLM interface.Support for using the pip installation
☆153Updated 10 months ago
shell-nlp / gpt_server
gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR和TTS的开源框架。
☆204Updated 2 weeks ago
andrewkchan / deepseek.cpp
CPU inference for the DeepSeek family of large language models in C++
☆308Updated 2 months ago
zilliztech / mcp-server-milvus
Model Context Protocol Servers for Milvus
☆162Updated 2 months ago
vtuber-plan / olah
Self-hosted huggingface mirror service. 自建huggingface镜像服务。
☆184Updated 3 weeks ago
sasha0552 / pascal-pkgs-ci
The main repository for building Pascal-compatible versions of ML applications and libraries.
☆108Updated 2 months ago
matatonic / openedai-vision
An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.
☆260Updated 5 months ago
asprenger / ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
☆69Updated last year
gpustack / gpustack-ui
☆43Updated last week
ModelCloud / GPTQModel
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLa…
☆713Updated last week
iohub / collama
VSCode AI coding assistant powered by self-hosted llama.cpp endpoint.
☆183Updated 6 months ago
neka-nat / mineru-api
MinerU API server
☆65Updated 7 months ago
modelscope / modelscope-studio
A third-party component library based on Gradio.
☆110Updated last week
chu-tianxiang / vllm-gptq
A high-throughput and memory-efficient inference and serving engine for LLMs
☆131Updated last year