ROCm / rocm-libraries
monorepo for rocm libraries
☆11Updated this week
Alternatives and similar repositories for rocm-libraries
Users that are interested in rocm-libraries are comparing it to the libraries listed below
Sorting:
- Random number library that generate pseudo-random and quasi-random numbers.☆26Updated this week
- ROCm Systems Profiler☆17Updated this week
- Unit benchmarks of CUDA event APIs.☆17Updated last year
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆52Updated last month
- AMD’s C++ library for accelerating tensor primitives☆40Updated this week
- A fast implementation of log() and exp()☆53Updated 2 years ago
- Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line☆23Updated 3 weeks ago
- Reusable software components for ROCm developers☆83Updated this week
- ☆15Updated last week
- hipFFT is a FFT marshalling library.☆63Updated last week
- Bistra is a domain-specific language designed to generate high-performance kernels (such as GEMMs, convolutions, etc). The program is des…☆6Updated last year
- ☆18Updated last month
- Zig regex experiment☆13Updated 9 months ago
- C++ template library for floating point operations☆27Updated 3 weeks ago
- Bandwidth test for ROCm☆55Updated last week
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 7 months ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆45Updated this week
- LLVM-Canon aims to transform LLVM modules into a canonical form by reordering and renaming instructions while preserving the same semanti…☆15Updated last year
- Utilities for accessing AMD's Machine-Readable GPU ISA Specifications.☆33Updated 2 months ago
- A tracing JIT compiler for PyTorch☆13Updated 3 years ago
- ☆15Updated 2 years ago
- C implementation of the L-Mul f32/f16 multiplications from paper: https://arxiv.org/html/2410.00907☆27Updated 7 months ago
- ☆18Updated 10 months ago
- ☆19Updated last week
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆108Updated last week
- Can I make an *optimizing* compiler under 1k lines of code?☆56Updated 2 months ago
- SYCL Reference Manual☆27Updated last year
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆55Updated 3 weeks ago
- SYCL Conformance Tests☆70Updated this week