vtuber-plan / olah
Self-hosted huggingface mirror service. 自建huggingface镜像服务。
☆165Updated last week
Alternatives and similar repositories for olah
Users that are interested in olah are comparing it to the libraries listed below
Sorting:
- ☆207Updated last week
- LM inference server implementation based on *.cpp.☆185Updated last week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆65Updated last year
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆29Updated last month
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆110Updated this week
- Self-host LLMs with vLLM and BentoML☆109Updated last week
- ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing☆78Updated 9 months ago
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)☆265Updated last year
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆352Updated this week
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆161Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆205Updated 9 months ago
- xet client tech, used in huggingface_hub☆95Updated this week
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆111Updated last week
- ☆17Updated 2 years ago
- Open Source Text Embedding Models with OpenAI Compatible API☆153Updated 10 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆291Updated this week
- ☆50Updated 5 months ago
- Unlock Unlimited Potential! Share Your GPU Power Across Your Local Network!☆56Updated 10 months ago
- Module, Model, and Tensor Serialization/Deserialization☆227Updated this week
- ☆59Updated last month
- NVIDIA vGPU Device Manager manages NVIDIA vGPU devices on top of Kubernetes☆130Updated last week
- This project is designed to simulate GPU information, making it easier to test scenarios where a GPU is not available.☆44Updated 2 months ago
- ☆110Updated last week
- ☆31Updated this week
- 🚢 Yet another operator for running large language models on Kubernetes with ease. Powered by Ollama! 🐫☆187Updated this week
- A shim driver allows in-docker nvidia-smi showing correct process list without modify anything☆84Updated 2 months ago
- Sentence Transformers API: An OpenAI compatible embedding API server☆58Updated 8 months ago
- Comparison of Language Model Inference Engines☆217Updated 4 months ago
- ☆11Updated 2 months ago
- Benchmark suite for LLMs from Fireworks.ai☆72Updated this week