vtuber-plan / olahLinks
Self-hosted huggingface mirror service. 自建huggingface镜像服务。
☆195Updated 2 months ago
Alternatives and similar repositories for olah
Users that are interested in olah are comparing it to the libraries listed below
Sorting:
- LM inference server implementation based on *.cpp.☆279Updated last month
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)☆275Updated last year
- xet client tech, used in huggingface_hub☆297Updated this week
- ☆511Updated this week
- ☆255Updated last week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆72Updated last year
- A shim driver allows in-docker nvidia-smi showing correct process list without modify anything☆94Updated 3 months ago
- Module, Model, and Tensor Serialization/Deserialization☆267Updated last month
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆438Updated this week
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆208Updated last month
- OpenAI compatible API for TensorRT LLM triton backend☆215Updated last year
- ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing☆80Updated last year
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆29Updated 6 months ago
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆165Updated 2 months ago
- Unlock Unlimited Potential! Share Your GPU Power Across Your Local Network!☆64Updated 4 months ago
- This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.☆80Updated this week
- Open Source Text Embedding Models with OpenAI Compatible API☆160Updated last year
- 🚢 Yet another operator for running large language models on Kubernetes with ease. Powered by Ollama! 🐫☆219Updated this week
- a huggingface mirror site.☆305Updated last year
- Comparison of Language Model Inference Engines☆230Updated 9 months ago
- NVIDIA vGPU Device Manager manages NVIDIA vGPU devices on top of Kubernetes☆142Updated this week
- Getting Started with the CoreWeave Kubernetes GPU Cloud☆75Updated 4 months ago
- ☆64Updated 6 months ago
- MaK(Mac+Kubernetes)llama - Running and orchestrating large language models (LLMs) on Kubernetes with macOS nodes.☆42Updated last year
- An Envoy inspired, ultimate LLM-first gateway for LLM serving and downstream application developers and enterprises☆24Updated 5 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆132Updated 3 weeks ago
- Self-host LLMs with vLLM and BentoML☆150Updated 2 weeks ago
- GPU environment and cluster management with LLM support☆642Updated last year
- Practical GPU Sharing Without Memory Size Constraints☆287Updated 6 months ago
- The LLM API Benchmark Tool is a flexible Go-based utility designed to measure and analyze the performance of OpenAI-compatible API endpoi…☆47Updated this week