JuliaGPU / GemmKernels.jl
Flexible and performant GEMM kernels in Julia
☆78Updated 3 weeks ago
Related projects: ⓘ
- ☆55Updated 3 weeks ago
- Programming Gemm Kernels on NVIDIA GPUs with Tensor Cores in Julia☆35Updated 3 weeks ago
- ☆57Updated 7 months ago
- Julia wrapper for the performance monitoring and benchmarking suite LIKWID.☆58Updated 2 weeks ago
- Distributed Data Parallel Training of Deep Neural Networks☆56Updated 6 months ago
- Automatic GPU, TPU, FPGA, Xeon Phi, Multithreaded, Distributed, etc. offloading for scientific machine learning (SciML) and differential …☆35Updated 2 years ago
- A version of the STREAM benchmark which measures the sustainable memory bandwidth.☆26Updated last month
- Proof of Concept: a C-callable GPU-enabled parallel 2-D heat diffusion solver written in Julia using CUDA, MPI and graphics☆24Updated 3 years ago
- IPU programming in Julia☆30Updated last week
- "Full speed or nothing." - James Hetfield☆106Updated 7 months ago
- Calculate with error-free, faithful, and compensated transforms and extended significands.☆66Updated last year
- Julia implementation for the BFloat16 number type☆48Updated this week
- ☆45Updated this week
- ☆92Updated 4 years ago
- Up or down? Maybe both?☆35Updated 8 months ago
- ☆63Updated 4 years ago
- Julia parallel constructs over MPI☆43Updated last year
- ☆19Updated last year
- Cross-platform vectorization of Julia code using Accelerate, VML, and Yeppp!☆19Updated 5 years ago
- Estimate the absolute performance of a piece of Julia code☆98Updated 8 months ago
- Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)☆56Updated last year
- Julia package for hierarchical matrices☆26Updated last year
- Unrolling loops at compile-time☆51Updated last year
- Make available to Julia the sparse functionality in MKL☆52Updated this week
- "Maybe we have our own magic."☆47Updated 4 years ago
- Sparse matrices in CSR format for Julia computations☆33Updated 3 weeks ago
- Mike's Little Intermediate Representation☆111Updated 2 months ago
- Julia bindings for NVTX, for instrumenting with the Nvidia Nsight Systems profiler☆28Updated 3 months ago
- Julia package to read MatrixMarket file format☆26Updated 2 months ago
- Julia package to facilitate writing mulithreaded, multidimensional, cache-efficient code☆81Updated 4 months ago