vtuber-plan / olahLinks
Self-hosted huggingface mirror service. 自建huggingface镜像服务。
☆169Updated 3 weeks ago
Alternatives and similar repositories for olah
Users that are interested in olah are comparing it to the libraries listed below
Sorting:
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆173Updated this week
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆118Updated this week
- LM inference server implementation based on *.cpp.☆203Updated last week
- This project is designed to simulate GPU information, making it easier to test scenarios where a GPU is not available.☆45Updated 3 months ago
- A shim driver allows in-docker nvidia-smi showing correct process list without modify anything☆86Updated 2 months ago
- ☆215Updated this week
- xet client tech, used in huggingface_hub☆111Updated this week
- Run Slurm on Kubernetes. A Slinky project.☆108Updated last week
- OpenAI compatible API for TensorRT LLM triton backend☆208Updated 10 months ago
- NVIDIA vGPU Device Manager manages NVIDIA vGPU devices on top of Kubernetes☆131Updated this week
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆367Updated this week
- ☆468Updated last month
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)☆267Updated last year
- ☆17Updated 2 years ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆127Updated last month
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆29Updated 2 months ago
- Module, Model, and Tensor Serialization/Deserialization☆234Updated last week
- 🚢 Yet another operator for running large language models on Kubernetes with ease. Powered by Ollama! 🐫☆192Updated this week
- Self-host LLMs with vLLM and BentoML☆114Updated last week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆67Updated last year
- Get up and running with Llama 3, Mistral, Gemma, and other large language models.☆26Updated 3 weeks ago
- ☆89Updated 2 months ago
- Transformer GPU VRAM estimator☆64Updated last year
- MaK(Mac+Kubernetes)llama - Running and orchestrating large language models (LLMs) on Kubernetes with macOS nodes.☆40Updated last year
- Helm charts for the KubeRay project☆43Updated 2 months ago
- FRP Fork☆166Updated last month
- Getting Started with the CoreWeave Kubernetes GPU Cloud☆71Updated 2 months ago
- Open Source Text Embedding Models with OpenAI Compatible API☆153Updated 10 months ago
- Benchmark suite for LLMs from Fireworks.ai☆75Updated 3 weeks ago
- OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)☆275Updated last year