fangjunzhou / blas-playgroundLinks
Playground project for BLAS demo.
☆29Updated last year
Alternatives and similar repositories for blas-playground
Users that are interested in blas-playground are comparing it to the libraries listed below
Sorting:
- ☆198Updated 2 years ago
- ☆40Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆194Updated 3 months ago
- AMD’s C++ library for accelerating tensor primitives☆42Updated this week
- AVX-512 documentation beyond what Intel provides☆52Updated last year
- Code for "SPSC Lock-free, Wait-free Fifo from the Ground Up" presentation at CPPCON 2023☆103Updated 9 months ago
- An introduction to language design through building a compiler frontend and completing a self-paced exercise on top of LLVM.☆120Updated 2 months ago
- ROCm Systems Profiler☆19Updated this week
- Serial and parallel implementations of matrix multiplication☆41Updated 4 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆173Updated this week
- Source code for 'Modern Parallel Programming with C++ and Assembly' by Dan Kusswurm☆64Updated 3 years ago
- Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research☆109Updated last year
- An implementation of HIP that works on CPUs, across OSes.☆120Updated last year
- Simple C++ borrow checker☆68Updated 2 years ago
- ☆151Updated this week
- ☆144Updated last week
- Nvidia Instruction Set Specification Generator☆271Updated 11 months ago
- TPP experimentation on MLIR for linear algebra☆131Updated last week
- ROCm BLAS marshalling library☆142Updated this week
- Flexible memory allocation tool for multi-tiered memory systems☆13Updated 4 months ago
- A header only library implementing common mathematical functions using SIMD intrinsics☆107Updated 3 months ago
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆148Updated 3 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆131Updated last year
- Distributed ranges is a generalization of C++ ranges for distributed data structures.☆51Updated 3 weeks ago
- Tenstorrent MLIR compiler☆132Updated this week
- rocWMMA☆114Updated last week
- This is the AMD-maintained fork of the LLVM git repository. This repository accepts pull requests and issues related to AMD fork-specific…☆152Updated this week
- Powerful automatic differentiation in C++ and Python☆373Updated 3 weeks ago
- Docker runner for build-bench☆302Updated last year
- C++ implementation of the lox toy language used from the crafting interpreters book (http://www.craftinginterpreters.com/)☆37Updated 4 years ago