vtuber-plan / olah
Self-hosted huggingface mirror service. 自建huggingface镜像服务。
☆151Updated last month
Alternatives and similar repositories for olah:
Users that are interested in olah are comparing it to the libraries listed below
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)☆263Updated last year
- LM inference server implementation based on *.cpp.☆165Updated this week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆65Updated last year
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆99Updated this week
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆145Updated this week
- parallel fetch☆123Updated this week
- ☆191Updated 2 weeks ago
- Self-host LLMs with vLLM and BentoML☆102Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆223Updated last month
- 🚢 Yet another operator for running large language models on Kubernetes with ease. Powered by Ollama! 🐫☆184Updated this week
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆29Updated 3 weeks ago
- Unlock Unlimited Potential! Share Your GPU Power Across Your Local Network!☆52Updated 9 months ago
- xet client tech, used in huggingface_hub☆80Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆205Updated 8 months ago
- ☆48Updated 2 weeks ago
- An OpenAI Completions API compatible server for NLP transformers models☆65Updated last year
- Benchmark suite for LLMs from Fireworks.ai☆70Updated 2 months ago
- Comparison of Language Model Inference Engines☆212Updated 4 months ago
- Open Source Text Embedding Models with OpenAI Compatible API☆151Updated 9 months ago
- ☆17Updated 2 years ago
- ☆49Updated 4 months ago
- ☆450Updated this week
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆68Updated last week
- An endpoint server for efficiently serving quantized open-source LLMs for code.☆54Updated last year
- Getting Started with the CoreWeave Kubernetes GPU Cloud☆70Updated last month
- MCP for Proxmox integration in Cline☆77Updated last month
- vLLM adapter for a TGIS-compatible gRPC server.☆26Updated this week
- vLLM Router☆26Updated last year
- ☆30Updated this week
- ☆241Updated this week