chsasank / device-benchmarksLinks
Benchmarks of different devices I have come across
☆39Updated 4 months ago
Alternatives and similar repositories for device-benchmarks
Users that are interested in device-benchmarks are comparing it to the libraries listed below
Sorting:
- High-Performance SGEMM on CUDA devices☆115Updated 11 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆48Updated 4 months ago
- MLIR-based partitioning system☆160Updated this week
- extensible collectives library in triton☆92Updated 9 months ago
- LLM training in simple, raw C/CUDA☆110Updated last year
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆63Updated 6 months ago
- OpenAI Triton backend for Intel® GPUs☆223Updated this week
- oneCCL Bindings for Pytorch* (deprecated)☆104Updated 2 weeks ago
- Ahead of Time (AOT) Triton Math Library☆87Updated this week
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆325Updated last week
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆83Updated 3 months ago
- Parallel framework for training and fine-tuning deep neural networks☆70Updated 2 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆41Updated last year
- ☆100Updated last year
- ☆342Updated last week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆371Updated this week
- ☆72Updated this week
- Fast low-bit matmul kernels in Triton☆423Updated 3 weeks ago
- SYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs☆62Updated this week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆148Updated last week
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆188Updated 3 weeks ago
- ☆50Updated last year
- ☆271Updated this week
- ☆28Updated last year
- ☆187Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!☆66Updated last week
- Automatic differentiation for Triton Kernels☆29Updated 5 months ago
- Collection of kernels written in Triton language☆174Updated 9 months ago
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- An experimental CPU backend for Triton☆170Updated 2 months ago