gpustack / llama-box
LM inference server implementation based on *.cpp.
☆165Updated this week
Alternatives and similar repositories for llama-box:
Users that are interested in llama-box are comparing it to the libraries listed below
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆99Updated this week
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆145Updated this week
- run DeepSeek-R1 GGUFs on KTransformers☆219Updated last month
- ☆85Updated last month
- ☆141Updated 2 months ago
- Port of Facebook's LLaMA model in C/C++☆41Updated this week
- The latest graphrag interface is used, using the local ollama to provide the LLM interface.Support for using the pip installation☆146Updated 6 months ago
- xllamacpp - a Python wrapper of llama.cpp☆32Updated this week
- gpt_server是一个用于生产级部署LLMs或Embedding的开源框架。☆165Updated this week
- Get up and running with Llama 3, Mistral, Gemma, and other large language models.☆26Updated this week
- GLM Series Edge Models☆134Updated last month
- Open Source Text Embedding Models with OpenAI Compatible API☆151Updated 9 months ago
- automatically quant GGUF models☆167Updated this week
- CPU inference for the DeepSeek family of large language models in pure C++☆288Updated this week
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆244Updated last week
- Convert files into markdown to help RAG or LLM understand, based on markitdown and MinerU, which could provide high quality pdf parser.☆82Updated 3 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆131Updated 9 months ago
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆570Updated this week
- prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters☆260Updated this week
- 研究GOT-OCR-项目落地加速,不限语言☆60Updated 5 months ago
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆71Updated last week
- ☆53Updated 4 months ago
- The Level-Navi Agent, a framework that requires no training and utilizes large language models for deep query understanding and precise s…☆78Updated 3 months ago
- 📚 This is an adapted version of Jina AI's Reader for local deployment using Docker. Convert any URL to an LLM-friendly input with a simp…☆185Updated 6 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆276Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆205Updated 8 months ago
- Comparison of Language Model Inference Engines☆212Updated 4 months ago
- Unsloth Fine-tuning Notebooks for Google Colab, Kaggle, Hugging Face and more.☆123Updated last week
- Port of Facebook's LLaMA model in C/C++☆92Updated this week
- A Pure Rust based LLM (Any LLM based MLLM such as Spark-TTS) Inference Engine, powering by Candle framework.☆93Updated 3 weeks ago