vtuber-plan / olahLinks
Self-hosted huggingface mirror service. 自建huggingface镜像服务。
☆211Updated 5 months ago
Alternatives and similar repositories for olah
Users that are interested in olah are comparing it to the libraries listed below
Sorting:
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)☆279Updated 2 years ago
- xet client tech, used in huggingface_hub☆372Updated 2 weeks ago
- LM inference server implementation based on *.cpp.☆295Updated last month
- ☆275Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆218Updated last year
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆78Updated last year
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆160Updated 4 months ago
- A shim driver allows in-docker nvidia-smi showing correct process list without modify anything☆100Updated 6 months ago
- Module, Model, and Tensor Serialization/Deserialization☆283Updated 4 months ago
- ☆533Updated 3 months ago
- Inference server benchmarking tool☆135Updated 3 months ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆468Updated last week
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆223Updated this week
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆30Updated 9 months ago
- Sentence Transformers API: An OpenAI compatible embedding API server☆70Updated last year
- Open Source Text Embedding Models with OpenAI Compatible API☆165Updated last year
- ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing☆88Updated last month
- 🚢 Yet another operator for running large language models on Kubernetes with ease. Powered by Ollama! 🐫☆227Updated last week
- NVIDIA vGPU Device Manager manages NVIDIA vGPU devices on top of Kubernetes☆152Updated this week
- Comparison of Language Model Inference Engines☆238Updated last year
- Unlock Unlimited Potential! Share Your GPU Power Across Your Local Network!☆72Updated 7 months ago
- Getting Started with the CoreWeave Kubernetes GPU Cloud☆79Updated 6 months ago
- Practical GPU Sharing Without Memory Size Constraints☆296Updated 9 months ago
- GPU environment and cluster management with LLM support☆657Updated last year
- ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!☆284Updated 3 weeks ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆784Updated this week
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆187Updated 2 weeks ago
- Benchmark suite for LLMs from Fireworks.ai☆84Updated last month
- The driver for LMCache core to run in vLLM☆59Updated 11 months ago
- A diverse, simple, and secure all-in-one LLMOps platform☆109Updated last year