vtuber-plan / olahLinks
Self-hosted huggingface mirror service. 自建huggingface镜像服务。
☆173Updated last month
Alternatives and similar repositories for olah
Users that are interested in olah are comparing it to the libraries listed below
Sorting:
- ☆222Updated this week
- LM inference server implementation based on *.cpp.☆226Updated this week
- A shim driver allows in-docker nvidia-smi showing correct process list without modify anything☆87Updated 3 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆209Updated 10 months ago
- ☆472Updated 2 months ago
- run DeepSeek-R1 GGUFs on KTransformers☆236Updated 3 months ago
- xet client tech, used in huggingface_hub☆118Updated last week
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)☆269Updated last year
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆28Updated 2 months ago
- ☆13Updated 3 months ago
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆177Updated last week
- NVIDIA vGPU Device Manager manages NVIDIA vGPU devices on top of Kubernetes☆133Updated last week
- Module, Model, and Tensor Serialization/Deserialization☆241Updated 2 weeks ago
- Download models from the Ollama library, without Ollama☆86Updated 7 months ago
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆129Updated 2 weeks ago
- Benchmark suite for LLMs from Fireworks.ai☆76Updated 3 weeks ago
- Open Source Text Embedding Models with OpenAI Compatible API☆154Updated 11 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆131Updated last year
- ☆17Updated 2 years ago
- Comparison of Language Model Inference Engines☆217Updated 6 months ago
- This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.☆53Updated this week
- ☆268Updated 2 weeks ago
- This project is designed to simulate GPU information, making it easier to test scenarios where a GPU is not available.☆47Updated 3 months ago
- GPU plugin to the node feature discovery for Kubernetes☆300Updated last year
- ☆42Updated 2 months ago
- Self-host LLMs with vLLM and BentoML☆123Updated this week
- ☆50Updated last month
- ☆50Updated this week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆129Updated last month
- InferX is a Inference Function as a Service Platform☆111Updated last week