vtuber-plan / olahLinks
Self-hosted huggingface mirror service. 自建huggingface镜像服务。
☆201Updated 3 months ago
Alternatives and similar repositories for olah
Users that are interested in olah are comparing it to the libraries listed below
Sorting:
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)☆275Updated 2 years ago
 - xet client tech, used in huggingface_hub☆308Updated last week
 - A shim driver allows in-docker nvidia-smi showing correct process list without modify anything☆96Updated 4 months ago
 - LM inference server implementation based on *.cpp.☆286Updated 2 months ago
 - Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆212Updated 2 months ago
 - ☆517Updated 3 weeks ago
 - A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆30Updated 7 months ago
 - Unlock Unlimited Potential! Share Your GPU Power Across Your Local Network!☆66Updated 5 months ago
 - ☆262Updated 2 weeks ago
 - Open Source Text Embedding Models with OpenAI Compatible API☆160Updated last year
 - A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆75Updated last year
 - a huggingface mirror site.☆310Updated last year
 - GPUd automates monitoring, diagnostics, and issue identification for GPUs☆441Updated last week
 - OpenAI compatible API for TensorRT LLM triton backend☆216Updated last year
 - A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆169Updated 3 months ago
 - NVIDIA vGPU Device Manager manages NVIDIA vGPU devices on top of Kubernetes☆144Updated last week
 - ⚡️ 80x faster Fasttext language detection out of the box | Split text by language☆253Updated last month
 - Inference server benchmarking tool☆121Updated last month
 - Module, Model, and Tensor Serialization/Deserialization☆272Updated 2 months ago
 - This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.☆86Updated this week
 - parallel fetch☆138Updated 2 weeks ago
 - 🚢 Yet another operator for running large language models on Kubernetes with ease. Powered by Ollama! 🐫☆220Updated this week
 - OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)☆275Updated 2 years ago
 - 🪶 Lightweight OpenAI drop-in replacement for Kubernetes☆146Updated last year
 - ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!☆263Updated this week
 - The LLM API Benchmark Tool is a flexible Go-based utility designed to measure and analyze the performance of OpenAI-compatible API endpoi…☆49Updated 3 weeks ago
 - ☆64Updated 7 months ago
 - Comparison of Language Model Inference Engines☆233Updated 10 months ago
 - FRP Fork☆175Updated 6 months ago
 - ☆54Updated last week