Repository to host ROCm Developer Hub Notebook Tutorials
☆58Mar 20, 2026Updated this week
Alternatives and similar repositories for gpuaidev
Users that are interested in gpuaidev are comparing it to the libraries listed below
Sorting:
- ☆11Sep 8, 2025Updated 6 months ago
- Automating analysis from trace files☆63Mar 13, 2026Updated last week
- ☆23Jul 11, 2025Updated 8 months ago
- Automated bottleneck detection and solution orchestration☆19Feb 24, 2026Updated 3 weeks ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆31Updated this week
- ☆19Mar 3, 2025Updated last year
- Flax (Jax) implementation of DeepSeek-R1-Distill-Qwen-1.5B with weights ported from Hugging Face.☆26Feb 20, 2025Updated last year
- Utilities for ROCm Tech Support Log Collections☆13Mar 14, 2026Updated last week
- Implementation of Decision Stacks: Flexible RL via Modular Generative Models [NeurIPS 2023]☆12Jun 27, 2023Updated 2 years ago
- ☆15Sep 7, 2022Updated 3 years ago
- ☆13Mar 16, 2018Updated 8 years ago
- Development repository for the Triton language and compiler☆143Mar 13, 2026Updated last week
- Graphics Experiment - YCoCg Frame Buffers☆18Feb 5, 2020Updated 6 years ago
- Step by step implementation of a fast softmax kernel in CUDA☆62Jan 6, 2025Updated last year
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆18Feb 9, 2026Updated last month
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 7 months ago
- ☆13May 30, 2023Updated 2 years ago
- HPC Performance Anomaly Suite☆21Jun 11, 2020Updated 5 years ago
- Accelerated computing with HIP☆28Mar 14, 2025Updated last year
- Collection of best practices, optimization guides, architecture overview, performance counters☆52Feb 5, 2026Updated last month
- ☆16Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆12Mar 2, 2026Updated 2 weeks ago
- Lime sample projects☆14Jan 2, 2025Updated last year
- ☆19Mar 12, 2025Updated last year
- ☆28Mar 10, 2026Updated last week
- ☆93Nov 11, 2025Updated 4 months ago
- Super fast FP32 matrix multiplication on RDNA3☆87Mar 30, 2025Updated 11 months ago
- A collection of GPU experiments and benchmarks for my personal understanding and research.☆26Updated this week
- ☆30Mar 2, 2026Updated 2 weeks ago
- ROCm Documentation Python package for ReadTheDocs build standardization☆16Mar 10, 2026Updated last week
- The goal of the OSSCI Fleet is to provide a central mechanism to enable test automation, batch job scheduling, and developer access to a …☆13Feb 27, 2026Updated 3 weeks ago
- ☆16Feb 24, 2026Updated 3 weeks ago
- A Phaser 3 Project Template that uses a custom build of Phaser☆11Jan 1, 2023Updated 3 years ago
- ☆72Updated this week
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- Personal solutions to the Triton Puzzles☆20Jul 18, 2024Updated last year
- Examples illustrating usage of the rocBLAS library☆17Aug 12, 2024Updated last year
- LLMem: GPU Memory Estimation for Fine-Tuning Pre-Trained LLMs☆29May 31, 2025Updated 9 months ago
- An interactive web-based tool for exploring intermediate representations of PyTorch and Triton models☆49Jan 23, 2026Updated last month