allegroai / clearml-fractional-gpu
ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing
☆62Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for clearml-fractional-gpu
- A top-like tool for monitoring GPUs in a cluster☆80Updated 8 months ago
- ClearML - Model-Serving Orchestration and Repository Solution☆138Updated 2 months ago
- GPU environment and cluster management with LLM support☆491Updated 5 months ago
- ☆116Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆187Updated 3 weeks ago
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆183Updated 2 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆57Updated last month
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆55Updated 3 months ago
- Quadra: Effortless and reproducible deep learning workflows with configuration files.☆48Updated 2 weeks ago
- Ray - A curated list of resources: https://github.com/ray-project/ray☆42Updated last year
- Adapted version of llama3.np (NumPy) to a CuPy implementation for the Llama 3 model.☆36Updated 5 months ago
- Google TPU optimizations for transformers models☆74Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆250Updated last month
- Use QLoRA to tune LLM in PyTorch-Lightning w/ Huggingface + MLflow☆54Updated 11 months ago
- Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and te…☆42Updated 9 months ago
- ☆96Updated last month
- The backend behind the LLM-Perf Leaderboard☆11Updated 6 months ago
- Helm charts for the KubeRay project☆31Updated last month
- experiments with inference on llama☆105Updated 5 months ago
- Curated list of awesome material on optimization techniques to make artificial intelligence faster and more efficient 🚀☆112Updated last year
- ☆121Updated this week
- TorchFix - a linter for PyTorch-using code with autofix support☆100Updated this week
- **ARCHIVED** Filesystem interface to 🤗 Hub☆56Updated last year
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆229Updated this week
- Benchmark suite for LLMs from Fireworks.ai☆58Updated this week
- Deep Learning Energy Measurement and Optimization☆213Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆175Updated 3 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆133Updated 3 months ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆153Updated this week