☆20Nov 7, 2019Updated 6 years ago
Alternatives and similar repositories for NVIDIA-tensor-core-examples
Users that are interested in NVIDIA-tensor-core-examples are comparing it to the libraries listed below
Sorting:
- ☆11Jul 13, 2022Updated 3 years ago
- ☆11Apr 10, 2019Updated 6 years ago
- Object-oriented extension to the CMake language.☆13Jun 18, 2025Updated 8 months ago
- AnacondaCON 2019 GPU Deep Learning Tutorial☆16Aug 14, 2024Updated last year
- Generates API documentation for CMake functions and macros☆15Jan 23, 2026Updated last month
- ☆23Updated this week
- MathLib is a versatile C++ library that provides a wide range of mathematical algorithms and functions, including but not limited to tran…☆11Jun 6, 2023Updated 2 years ago
- This repository mirrors the principal Gitlab repository of the Chebyshev Accelerated Subspace iteration Eigensolver. If you want to contr…☆19Feb 16, 2026Updated last week
- Automatic code generation of Fast Multipole and Barnes-Hut operators☆17Oct 25, 2022Updated 3 years ago
- ☆20Sep 28, 2024Updated last year
- NVIDIA Performance Libraries: Sample code☆22Nov 20, 2025Updated 3 months ago
- ☆31Apr 2, 2025Updated 10 months ago
- Global Memory and Threading runtime system☆25Dec 10, 2025Updated 2 months ago
- An OpenMP runtime implemented using HPX☆24Aug 4, 2022Updated 3 years ago
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆23Jan 11, 2024Updated 2 years ago
- a wavelet-based multifractal image analysis tool implementing the WTMM (Wavelet Transform Modulus Maxima) method.☆11Feb 1, 2020Updated 6 years ago
- Benchmarks to capture important workloads.☆32Feb 5, 2026Updated 3 weeks ago
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆35Sep 15, 2023Updated 2 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆77Mar 27, 2023Updated 2 years ago
- ☆30Jan 20, 2026Updated last month
- ☆10Dec 25, 2022Updated 3 years ago
- ☆10Jun 29, 2021Updated 4 years ago
- MiniFE Finite Element Mini-Application☆40Apr 24, 2024Updated last year
- A FORTRAN implementation of a Moving Finite Volume MHD code in three dimensions including self-gravity.☆12Jan 24, 2017Updated 9 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆144Aug 18, 2020Updated 5 years ago
- C++ template containers with optimized memory consumption☆12Updated this week
- Tercera y última parte de la saga de métodos numéricos con Python☆11May 30, 2022Updated 3 years ago
- Revisiting Whittaker-Henderson Smoothing☆11Jun 19, 2025Updated 8 months ago
- ☆18Feb 12, 2026Updated 2 weeks ago
- C++17 Wrapper for ScaLAPACK☆11Oct 5, 2023Updated 2 years ago
- Deployed version of Tableaunoir. Do not modify this repository.☆11Feb 18, 2026Updated last week
- ☆10Mar 2, 2021Updated 4 years ago
- University of Vermont Mechanical Engineering Heat Transfer Course☆16Apr 6, 2023Updated 2 years ago
- Distributed Communication-Optimal Shuffle and Transpose Algorithm☆14Feb 20, 2026Updated last week
- This repository contains material for HPX tutorials given by members of the STE||AR-Group☆35May 29, 2023Updated 2 years ago
- Official implementation of Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores.☆14Nov 13, 2025Updated 3 months ago
- GPU-accelerated RIME implementations. An offshoot of the BIRO projects, and one of the foothills of Mt Exaflop.☆10Dec 10, 2025Updated 2 months ago
- EPOCH Input System Version 2☆10Jun 5, 2020Updated 5 years ago
- Performance portable routines for opacity, emissivity, and scattering☆13Jan 22, 2026Updated last month