vtuber-plan / olahLinks
Self-hosted huggingface mirror service. 自建huggingface镜像服务。
☆176Updated 2 months ago
Alternatives and similar repositories for olah
Users that are interested in olah are comparing it to the libraries listed below
Sorting:
- LM inference server implementation based on *.cpp.☆236Updated this week
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆185Updated last week
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)☆270Updated last year
- ☆228Updated last week
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆134Updated last week
- Unlock Unlimited Potential! Share Your GPU Power Across Your Local Network!☆61Updated last month
- xet client tech, used in huggingface_hub☆127Updated this week
- ☆483Updated 3 months ago
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆100Updated last month
- NVIDIA vGPU Device Manager manages NVIDIA vGPU devices on top of Kubernetes☆136Updated last week
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆28Updated 3 months ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆387Updated this week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆68Updated last year
- OpenAI compatible API for TensorRT LLM triton backend☆209Updated 11 months ago
- A shim driver allows in-docker nvidia-smi showing correct process list without modify anything☆88Updated 2 weeks ago
- Self-host LLMs with vLLM and BentoML☆134Updated 2 weeks ago
- Open Source Text Embedding Models with OpenAI Compatible API☆155Updated last year
- Comparison of Language Model Inference Engines☆219Updated 7 months ago
- 🚢 Yet another operator for running large language models on Kubernetes with ease. Powered by Ollama! 🐫☆196Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆248Updated last week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆129Updated last week
- Getting Started with the CoreWeave Kubernetes GPU Cloud☆73Updated last month
- FRP Fork☆171Updated 3 months ago
- parallel fetch☆134Updated last week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆405Updated this week
- Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth☆122Updated this week
- Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp☆154Updated 2 months ago
- Download models from the Ollama library, without Ollama☆89Updated 8 months ago
- ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing☆78Updated 11 months ago
- Get up and running with Llama 3, Mistral, Gemma, and other large language models.☆27Updated this week