roscisz / TensorHiveLinks
Tool for managing exclusive GPU access for distributed machine learning workloads
☆169Updated last year
Alternatives and similar repositories for TensorHive
Users that are interested in TensorHive are comparing it to the libraries listed below
Sorting:
- TF 2.x and PyTorch Lightning Callbacks for GPU monitoring☆92Updated 5 years ago
- NVIDIA GPU tools - monitoring on CLI & web app with multiple agents☆89Updated last year
- Pytorch Lightning Distributed Accelerators using Ray☆215Updated 2 years ago
- Python 3 Bindings for the NVIDIA Management Library☆141Updated last year
- Deep Learning Benchmarking Suite☆130Updated 2 years ago
- ☆158Updated 2 months ago
- Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era.☆204Updated 5 years ago
- ☆108Updated 4 years ago
- An ML framework to accelerate research and its path to production.☆268Updated last year
- Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.☆249Updated 3 years ago
- Configuration classes enabling Hydra to configure and manage Pytorch Lightning projects.☆43Updated 4 years ago
- MLCube® is a project that reduces friction for machine learning by ensuring that models are easily portable and reproducible.☆157Updated last year
- PyTorch model training and layer saturation monitor☆82Updated 2 years ago
- Smoothly deprecate and redirect Python functions/classes with smart warnings and auto-routing—keep your codebase clean while maintaining …☆51Updated 2 weeks ago
- DeepOBS: A Deep Learning Optimizer Benchmark Suite☆109Updated last year
- ☆36Updated 3 years ago
- A very small PyTorch container in Alpine Linux☆68Updated 7 years ago
- An automatic ML model optimization tool.☆200Updated 2 years ago
- Version control for software 2.0☆64Updated 4 years ago
- PyTorch interface for the IPU☆181Updated 2 years ago
- A set of useful tools for DL experiments, project templates, etc.☆35Updated 4 years ago
- HetSeq: Distributed GPU Training on Heterogeneous Infrastructure☆106Updated 2 years ago
- Large Model Support in Tensorflow☆202Updated 5 years ago
- Asynchronous Distributed Hyperparameter Optimization.☆299Updated last week
- Example python package with pybind11 cpp extension☆57Updated 4 years ago
- Train ImageNet in 18 minutes on AWS☆134Updated last year
- Placeholder for the opensource Grid AI components☆45Updated 3 years ago
- A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-…☆67Updated 2 years ago
- A queue service for quickly developing scripts that use all your GPUs efficiently☆88Updated 3 years ago
- PyProf2: PyTorch Profiling tool☆82Updated 5 years ago