nod-ai / ossci-fleetLinks
The goal of the OSSCI Fleet is to provide a central mechanism to enable test automation, batch job scheduling, and developer access to a federated set of limited GPU resources
☆12Updated 3 weeks ago
Alternatives and similar repositories for ossci-fleet
Users that are interested in ossci-fleet are comparing it to the libraries listed below
Sorting:
- ROCm Documentation Python package for ReadTheDocs build standardization☆16Updated this week
- Repository to host ROCm Developer Hub Notebook Tutorials☆11Updated 2 weeks ago
- ☆24Updated 3 weeks ago
- RCCL Performance Benchmark Tests☆67Updated 2 weeks ago
- Ongoing research training transformer models at scale☆22Updated last week
- ☆25Updated this week
- AMD SMI☆68Updated this week
- OpenAI Triton backend for Intel® GPUs☆187Updated this week
- ☆36Updated this week
- AI Tensor Engine for ROCm☆201Updated this week
- ☆46Updated this week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆86Updated last week
- RDC☆29Updated this week
- ROC profiler library. Profiling with perf-counters and derived metrics.☆147Updated last week
- Bandwidth test for ROCm☆56Updated 2 weeks ago
- ☆20Updated 2 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆95Updated 2 weeks ago
- ☆146Updated this week
- ☆20Updated last month
- ☆96Updated last year
- Multi-GPU communication profiler and visualizer☆29Updated 11 months ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆83Updated last week
- rocWMMA☆114Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆97Updated this week
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆39Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆79Updated this week
- ☆109Updated 3 weeks ago
- ROCm Communication Collectives Library (RCCL)☆338Updated this week
- ☆18Updated last week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆169Updated last week