gpustack / llama-box
LM inference server implementation based on *.cpp.
☆97Updated this week
Alternatives and similar repositories for llama-box:
Users that are interested in llama-box are comparing it to the libraries listed below
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆42Updated last month
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆102Updated this week
- automatically quant GGUF models☆154Updated this week
- Open Source Text Embedding Models with OpenAI Compatible API☆145Updated 7 months ago
- Self-hosted huggingface mirror service.☆121Updated last week
- ☆132Updated this week
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆225Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆130Updated 7 months ago
- A third-party component library based on Gradio.☆81Updated this week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆62Updated 10 months ago
- Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024☆53Updated 3 months ago
- The latest graphrag interface is used, using the local ollama to provide the LLM interface.Support for using the pip installation☆139Updated 4 months ago
- gpt_server是一个用于生产级部署LLMs或Embedding的开源框架。☆152Updated last week
- ☆60Updated 10 months ago
- Mixture-of-Experts (MoE) Language Model☆184Updated 5 months ago
- GPU Power and Performance Manager☆55Updated 4 months ago
- This is the Mixture-of-Agents (MoA) concept, adapted from the original work by TogetherAI. My version is tailored for local model usage a…☆110Updated 7 months ago
- ☆192Updated 3 weeks ago
- ☆53Updated 8 months ago
- A pipeline parallel training script for LLMs.☆124Updated 3 weeks ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆53Updated this week
- bisheng-unstructured library☆41Updated 3 months ago
- LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.☆97Updated this week
- ☆53Updated 2 months ago
- ☆24Updated 3 weeks ago
- llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deploy…☆79Updated 9 months ago
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆55Updated 2 months ago
- Clone of https://r.jina.ai which is deployable locally☆38Updated 5 months ago