Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
☆13Nov 3, 2023Updated 2 years ago
Alternatives and similar repositories for cuda_back2back_hgemm
Users that are interested in cuda_back2back_hgemm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 9 months ago
- Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…☆13Aug 12, 2022Updated 3 years ago
- Lemon is an LALR(1) parser generator for C or C++.☆17Jun 10, 2014Updated 11 years ago
- Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…☆31Jun 26, 2024Updated last year
- A intelligent matrix format designer for SpMV☆10Oct 10, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 🍓 A toy object-oriented programming language written by rust☆17Apr 10, 2024Updated 2 years ago
- Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding☆17Oct 20, 2021Updated 4 years ago
- Python client for the etcd API v3, supported python >= 3.7, under active maintenance☆13Aug 4, 2025Updated 9 months ago
- 6502 Emulator written in C++☆13Feb 18, 2025Updated last year
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆60Nov 24, 2023Updated 2 years ago
- blockchain open sources☆11Aug 18, 2017Updated 8 years ago
- kubernetes调试检测工具☆13Nov 8, 2018Updated 7 years ago
- ☆97Mar 21, 2026Updated last month
- 华为集合通信性能测试☆16May 27, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Parallel cuckoo hashing on GPUs with CUDA☆12Sep 27, 2019Updated 6 years ago
- Parallel SpMV using CSR representation, built in CUDA☆14Jun 27, 2020Updated 5 years ago
- ☆13Nov 25, 2019Updated 6 years ago
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆544Sep 8, 2024Updated last year
- ☆10Apr 24, 2023Updated 3 years ago
- ☆22Sep 10, 2025Updated 8 months ago
- Yet another Polyhedra Compiler for DeepLearning☆19Apr 14, 2023Updated 3 years ago
- Experiments evaluating preemption on the NVIDIA Pascal architecture☆16Nov 10, 2016Updated 9 years ago
- ☆17Aug 9, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Offline renderer using CUDA☆13Jun 8, 2020Updated 5 years ago
- ☆33Apr 2, 2025Updated last year
- ECM Factorization on CUDA-GPUs☆15Sep 29, 2020Updated 5 years ago
- Source code of the SC '23 paper: "DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multipli…☆29Jun 18, 2024Updated last year
- Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.☆14Feb 8, 2023Updated 3 years ago
- Convert CUDA programs from float data type to half or half2 with SIMDization☆19May 28, 2019Updated 6 years ago
- A SoundFont MIDI synthesizer written in pure Odinlang☆11Aug 13, 2023Updated 2 years ago
- ☆18Mar 12, 2025Updated last year
- A CUDA implementation of Arithmetic Coding☆18Jan 21, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- The code repository of DGCNN on FPGA: Acceleration of The Point Cloud Classifier Using FPGAs☆17Mar 6, 2023Updated 3 years ago
- BWA-MEM program accelerated with the GPUSeed and GASAL2 libraries☆19Dec 16, 2022Updated 3 years ago
- Music GAN - GANSynth preprocessing, ProGAN and DCGAN architecture☆11Jan 26, 2023Updated 3 years ago
- Console Sake Game in Assembly☆24Oct 24, 2022Updated 3 years ago
- An Open Source Kepler GPU Assembler☆20Jan 23, 2017Updated 9 years ago
- A Data Oriented C Compiler in C☆25Mar 28, 2024Updated 2 years ago
- A "minimal" example of a Vulkan rainbow triangle in Odin with GLFW.☆11Jun 2, 2024Updated last year