ROCm / pytorch-micro-benchmarking
☆14Updated 5 months ago
Related projects: ⓘ
- ROCm Communication Collectives Library (RCCL)☆251Updated this week
- Intel® Tensor Processing Primitives extension for Pytorch*☆10Updated last week
- A tool for bandwidth measurements on NVIDIA GPUs.☆285Updated 3 months ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆293Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆56Updated 3 weeks ago
- This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.☆138Updated this week
- RCCL Performance Benchmark Tests☆41Updated last week
- oneAPI Level Zero Conformance & Performance test content☆45Updated last week
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆233Updated this week
- oneAPI Collective Communications Library (oneCCL)☆189Updated 3 weeks ago
- NCCL Profiling Kit☆104Updated 2 months ago
- OpenAI Triton backend for Intel® GPUs☆126Updated this week
- Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysi…☆196Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆35Updated this week
- oneCCL Bindings for Pytorch*☆83Updated last week
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆152Updated this week
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆118Updated 2 weeks ago
- AMD's graph optimization engine.☆183Updated this week
- RDMA and SHARP plugins for nccl library☆154Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆11Updated 3 weeks ago
- A collection of examples for the ROCm software stack☆149Updated this week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆60Updated 8 months ago
- Development repository for the Triton language and compiler☆86Updated this week
- Microsoft Collective Communication Library☆304Updated last year
- ☆306Updated 4 months ago
- Synthesizer for optimal collective communication algorithms☆94Updated 5 months ago
- A validation and profiling tool for AI infrastructure☆252Updated this week
- This repository contains the results and code for the MLPerf™ Training v3.0 benchmark.☆12Updated last year
- ROC profiler library. Profiling with perf-counters and derived metrics.☆124Updated last week
- Bandwidth test for ROCm☆45Updated this week