JuliaGPU / GemmKernels.jlLinks
Flexible and performant GEMM kernels in Julia
☆83Updated last week
Alternatives and similar repositories for GemmKernels.jl
Users that are interested in GemmKernels.jl are comparing it to the libraries listed below
Sorting:
- Programming Gemm Kernels on NVIDIA GPUs with Tensor Cores in Julia☆43Updated last month
- ☆64Updated last year
- Julia library to manipulate MLIR dialects.☆65Updated last year
- Julia implementation for the BFloat16 number type☆58Updated 3 weeks ago
- ☆20Updated 2 years ago
- Distributed Data Parallel Training of Deep Neural Networks☆57Updated last year
- Calculate with error-free, faithful, and compensated transforms and extended significands.☆68Updated last month
- Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)☆58Updated 2 years ago
- Julia wrapper for the performance monitoring and benchmarking suite LIKWID.☆66Updated last year
- Julia bindings for NVTX, for instrumenting with the Nvidia Nsight Systems profiler☆39Updated 2 weeks ago
- Proof of Concept: a C-callable GPU-enabled parallel 2-D heat diffusion solver written in Julia using CUDA, MPI and graphics☆24Updated 5 years ago
- Estimate the absolute performance of a piece of Julia code☆101Updated 2 years ago
- ☆63Updated 5 years ago
- Record MPI operations on tape☆25Updated 2 years ago
- ☆54Updated 4 months ago
- GPU integrations for Dagger.jl☆54Updated 7 months ago
- Julia package for hierarchical matrices☆28Updated last year
- IPU programming in Julia☆30Updated 2 months ago
- Checkpointing for Automatic Differentiation☆60Updated this week
- Julia parallel constructs over MPI☆47Updated last month
- A version of the STREAM benchmark which measures the sustainable memory bandwidth.☆28Updated last month
- eXpression differentiation in Julia☆29Updated 6 years ago
- Julia package to read MatrixMarket file format☆32Updated last year
- Reusable compiler infrastructure for Julia GPU backends.☆170Updated this week
- Remez algorithm for computing minimax polynomial approximations☆44Updated 5 years ago
- "Full speed or nothing." - James Hetfield☆121Updated 2 months ago
- Sparse matrices in CSR format for Julia computations☆42Updated 5 months ago
- Automatic GPU, TPU, FPGA, Xeon Phi, Multithreaded, Distributed, etc. offloading for scientific machine learning (SciML) and differential …☆34Updated 4 years ago
- ☆81Updated last month
- This repo plans to provide a low-level Julia wrapper for BLIS typed interface.☆26Updated 3 months ago