nvwacloud / tensorlinkLinks
Unlock Unlimited Potential! Share Your GPU Power Across Your Local Network!
☆72Updated 6 months ago
Alternatives and similar repositories for tensorlink
Users that are interested in tensorlink are comparing it to the libraries listed below
Sorting:
- Self-hosted huggingface mirror service. 自建huggingface镜像服务。☆208Updated 5 months ago
- LM inference server implementation based on *.cpp.☆294Updated 3 weeks ago
- Implementation of remote CUDA/OpenCL protocol☆38Updated 6 months ago
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆219Updated 4 months ago
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)☆278Updated 2 years ago
- Open Source Text Embedding Models with OpenAI Compatible API☆164Updated last year
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆180Updated 2 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆132Updated last year
- 支持中文场景的的小语言模型 llama2.c-zh☆150Updated last year
- Download models from the Ollama library, without Ollama☆115Updated last year
- Comparison of Language Model Inference Engines☆237Updated last year
- NVIDIA vGPU Device Manager manages NVIDIA vGPU devices on top of Kubernetes☆152Updated last week
- xllamacpp - a Python wrapper of llama.cpp☆66Updated this week
- OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)☆275Updated 2 years ago
- ☆17Updated 2 years ago
- OpenAI compatible API for TensorRT LLM triton backend☆218Updated last year
- LLM Inference benchmark☆431Updated last year
- OpenAIOS vGPU device plugin for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory, in order to allow app…☆583Updated last year
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆270Updated 4 months ago
- ☆113Updated last year
- a huggingface mirror site.☆320Updated last year
- run DeepSeek-R1 GGUFs on KTransformers☆258Updated 9 months ago
- C++ implementation of Qwen-LM☆611Updated last year
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆78Updated last year
- Practical GPU Sharing Without Memory Size Constraints☆296Updated 8 months ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆464Updated this week
- Using CRDs to manage GPU resources in Kubernetes.☆209Updated 3 years ago
- Go Bindings for the NVIDIA Management Library (NVML)☆415Updated 3 weeks ago
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆250Updated last year
- Efficient AI Inference & Serving☆478Updated last year