intel / torch-xpu-ops
☆40Updated this week
Alternatives and similar repositories for torch-xpu-ops:
Users that are interested in torch-xpu-ops are comparing it to the libraries listed below
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 months ago
- OpenAI Triton backend for Intel® GPUs☆183Updated this week
- ☆47Updated 3 weeks ago
- A CUTLASS implementation using SYCL☆20Updated this week
- ☆60Updated 4 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- oneCCL Bindings for Pytorch*☆95Updated last week
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆134Updated this week
- MLIR-based partitioning system☆82Updated this week
- Intel® Tensor Processing Primitives extension for Pytorch*☆15Updated last week
- RCCL Performance Benchmark Tests☆64Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆218Updated 3 years ago
- oneAPI Collective Communications Library (oneCCL)☆232Updated last week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆100Updated 2 months ago
- Bandwidth test for ROCm☆54Updated 3 weeks ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆89Updated 3 weeks ago
- ☆30Updated this week
- Ahead of Time (AOT) Triton Math Library☆62Updated last week
- ☆20Updated last month
- Benchmarks to capture important workloads.☆31Updated 3 months ago
- Development repository for the Triton language and compiler☆118Updated this week
- rocWMMA☆110Updated this week
- Advanced Profiling and Analytics for AMD Hardware☆152Updated this week
- An extension library of WMMA API (Tensor Core API)☆96Updated 9 months ago
- Multi-GPU communication profiler and visualizer☆28Updated 10 months ago
- AI Tensor Engine for ROCm☆187Updated this week
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆83Updated this week
- Experimental projects related to TensorRT☆99Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆323Updated this week
- ROC profiler library. Profiling with perf-counters and derived metrics.☆144Updated this week