nvwacloud / tensorlink
Unlock Unlimited Potential! Share Your GPU Power Across Your Local Network!
☆56Updated 10 months ago
Alternatives and similar repositories for tensorlink
Users that are interested in tensorlink are comparing it to the libraries listed below
Sorting:
- LM inference server implementation based on *.cpp.☆185Updated this week
- Self-hosted huggingface mirror service. 自建huggingface镜像服务。☆165Updated last week
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆161Updated this week
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆110Updated this week
- A kubernetes plugin which enables dynamically add or remove GPU resources for a running Pod☆125Updated 3 years ago
- Open Source Text Embedding Models with OpenAI Compatible API☆153Updated 10 months ago
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)☆265Updated last year
- ☆108Updated last year
- A simple, High-Performance, Scalable ML/DL Models Repository based on OCI Artifacts☆33Updated last year
- 配合 HAI Platform 使用的集成化用户界面☆49Updated 2 years ago
- setup the env for vllm users☆16Updated last year
- Python actor framework for heterogeneous computing.☆149Updated this week
- Using CRDs to manage GPU resources in Kubernetes.☆199Updated 2 years ago
- Comparison of Language Model Inference Engines☆217Updated 4 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆205Updated 9 months ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆72Updated 3 months ago
- Device-plugin for volcano vgpu which support hard resource isolation☆73Updated 2 weeks ago
- A diverse, simple, and secure all-in-one LLMOps platform☆103Updated 7 months ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆352Updated this week
- Easier than K8s to lift and lower the gpu number of docker container and scale capacity size of volume.☆75Updated last year
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆65Updated last year
- run chatglm3-6b in BM1684X☆38Updated last year
- 支持中文场景的的小语言模型 llama2.c-zh☆146Updated last year
- ☆49Updated 8 months ago
- ☆17Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆131Updated 10 months ago
- xet client tech, used in huggingface_hub☆95Updated this week
- NVIDIA vGPU Device Manager manages NVIDIA vGPU devices on top of Kubernetes☆130Updated last week
- This project is designed to simulate GPU information, making it easier to test scenarios where a GPU is not available.☆44Updated 2 months ago
- Build NCCL-Tests and configure SSHD in PyTorch container to help you test NCCL faster!☆11Updated last year