vtuber-plan / olahLinks
Self-hosted huggingface mirror service. 自建huggingface镜像服务。
☆194Updated 2 months ago
Alternatives and similar repositories for olah
Users that are interested in olah are comparing it to the libraries listed below
Sorting:
- LM inference server implementation based on *.cpp.☆273Updated last month
- xet client tech, used in huggingface_hub☆219Updated this week
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆205Updated last month
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆29Updated 5 months ago
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)☆274Updated last year
- ☆249Updated this week
- A shim driver allows in-docker nvidia-smi showing correct process list without modify anything☆93Updated 2 months ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆427Updated last week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆72Updated last year
- ☆509Updated 5 months ago
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆161Updated 2 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆214Updated last year
- Module, Model, and Tensor Serialization/Deserialization☆265Updated last month
- Self-host LLMs with vLLM and BentoML☆149Updated last week
- Open Source Text Embedding Models with OpenAI Compatible API☆160Updated last year
- Unlock Unlimited Potential! Share Your GPU Power Across Your Local Network!☆65Updated 4 months ago
- Comparison of Language Model Inference Engines☆229Updated 9 months ago
- vLLM Router☆43Updated last year
- ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing☆80Updated last year
- 🚢 Yet another operator for running large language models on Kubernetes with ease. Powered by Ollama! 🐫☆216Updated last week
- Sentence Transformers API: An OpenAI compatible embedding API server☆67Updated last year
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆128Updated last month
- ☆64Updated 5 months ago
- Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth☆176Updated this week
- The driver for LMCache core to run in vLLM☆50Updated 7 months ago
- parallel fetch☆138Updated last week
- Benchmark suite for LLMs from Fireworks.ai☆83Updated 2 weeks ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆132Updated 2 weeks ago
- OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)☆276Updated last year
- ⚡️ 80x faster Fasttext language detection out of the box | Split text by language☆238Updated this week