vtuber-plan / olahLinks

Self-hosted huggingface mirror service. 自建huggingface镜像服务。

☆195

Alternatives and similar repositories for olah

Users that are interested in olah are comparing it to the libraries listed below

Sorting:

gpustack / llama-box
LM inference server implementation based on *.cpp.
☆279Updated last month
tensorchord / openmodelz
Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)
☆275Updated last year
huggingface / xet-core
xet client tech, used in huggingface_hub
☆297Updated this week
huggingface / hf_transfer
☆511Updated this week
run-ai / runai-model-streamer
☆255Updated last week
asprenger / ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
☆72Updated last year
matpool / mpu
A shim driver allows in-docker nvidia-smi showing correct process list without modify anything
☆94Updated 3 months ago
coreweave / tensorizer
Module, Model, and Tensor Serialization/Deserialization
☆267Updated last month
leptonai / gpud
GPUd automates monitoring, diagnostics, and issue identification for GPUs
☆438Updated this week
gpustack / gguf-parser-go
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
☆208Updated last month
npuichigo / openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
☆215Updated last year
clearml / clearml-fractional-gpu
ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing
☆80Updated last year
tensorchord / deepseek-api-arena
A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.
☆29Updated 6 months ago
gpustack / vox-box
A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.
☆165Updated 2 months ago
nvwacloud / tensorlink
Unlock Unlimited Potential! Share Your GPU Power Across Your Local Network!
☆64Updated 4 months ago
sgl-project / sgl-project.github.io
This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.
☆80Updated this week
rag-wtf / open-text-embeddings
Open Source Text Embedding Models with OpenAI Compatible API
☆160Updated last year
nekomeowww / ollama-operator
🚢 Yet another operator for running large language models on Kubernetes with ease. Powered by Ollama! 🐫
☆219Updated this week
padeoe / hf-mirror-site
a huggingface mirror site.
☆305Updated last year
lapp0 / lm-inference-engines
Comparison of Language Model Inference Engines
☆230Updated 9 months ago
NVIDIA / vgpu-device-manager
NVIDIA vGPU Device Manager manages NVIDIA vGPU devices on top of Kubernetes
☆142Updated this week
coreweave / kubernetes-cloud
Getting Started with the CoreWeave Kubernetes GPU Cloud
☆75Updated 4 months ago
substratusai / vllm-docker
☆64Updated 6 months ago
makllama / makllama
MaK(Mac+Kubernetes)llama - Running and orchestrating large language models (LLMs) on Kubernetes with macOS nodes.
☆42Updated last year
knoway-dev / knoway
An Envoy inspired, ultimate LLM-first gateway for LLM serving and downstream application developers and enterprises
☆24Updated 5 months ago
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆132Updated 3 weeks ago
bentoml / BentoVLLM
Self-host LLMs with vLLM and BentoML
☆150Updated 2 weeks ago
run-ai / genv
GPU environment and cluster management with LLM support
☆642Updated last year
grgalex / nvshare
Practical GPU Sharing Without Memory Size Constraints
☆287Updated 6 months ago
Yoosu-L / llmapibenchmark
The LLM API Benchmark Tool is a flexible Go-based utility designed to measure and analyze the performance of OpenAI-compatible API endpoi…
☆47Updated this week