Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
☆13Nov 3, 2023Updated 2 years ago
Alternatives and similar repositories for cuda_back2back_hgemm
Users that are interested in cuda_back2back_hgemm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 8 months ago
- Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…☆13Aug 12, 2022Updated 3 years ago
- Lemon is an LALR(1) parser generator for C or C++.☆17Jun 10, 2014Updated 11 years ago
- Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…☆31Jun 26, 2024Updated last year
- A intelligent matrix format designer for SpMV☆10Oct 10, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 🍓 A toy object-oriented programming language written by rust☆17Apr 10, 2024Updated 2 years ago
- Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding☆16Oct 20, 2021Updated 4 years ago
- 6502 Emulator written in C++☆13Feb 18, 2025Updated last year
- Python client for the etcd API v3, supported python >= 3.7, under active maintenance☆12Aug 4, 2025Updated 8 months ago
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆59Nov 24, 2023Updated 2 years ago
- ☆88Mar 21, 2026Updated 3 weeks ago
- blockchain open sources☆11Aug 18, 2017Updated 8 years ago
- kubernetes调试检测工具☆13Nov 8, 2018Updated 7 years ago
- 华为集合通信性能测试☆16May 27, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Parallel cuckoo hashing on GPUs with CUDA☆12Sep 27, 2019Updated 6 years ago
- ☆13Nov 25, 2019Updated 6 years ago
- Parallel SpMV using CSR representation, built in CUDA☆14Jun 27, 2020Updated 5 years ago
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆541Sep 8, 2024Updated last year
- ☆10Apr 24, 2023Updated 2 years ago
- ☆22Sep 10, 2025Updated 7 months ago
- Yet another Polyhedra Compiler for DeepLearning☆19Apr 14, 2023Updated 3 years ago
- Experiments evaluating preemption on the NVIDIA Pascal architecture☆16Nov 10, 2016Updated 9 years ago
- ☆17Aug 9, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Offline renderer using CUDA☆13Jun 8, 2020Updated 5 years ago
- ☆66Updated this week
- ☆32Apr 2, 2025Updated last year
- Source code of the SC '23 paper: "DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multipli…☆29Jun 18, 2024Updated last year
- ECM Factorization on CUDA-GPUs☆14Sep 29, 2020Updated 5 years ago
- Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.☆14Feb 8, 2023Updated 3 years ago
- Convert CUDA programs from float data type to half or half2 with SIMDization☆19May 28, 2019Updated 6 years ago
- A SoundFont MIDI synthesizer written in pure Odinlang☆11Aug 13, 2023Updated 2 years ago
- ☆18Mar 12, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- The code repository of DGCNN on FPGA: Acceleration of The Point Cloud Classifier Using FPGAs☆17Mar 6, 2023Updated 3 years ago
- A CUDA implementation of Arithmetic Coding☆18Jan 21, 2025Updated last year
- BWA-MEM program accelerated with the GPUSeed and GASAL2 libraries☆19Dec 16, 2022Updated 3 years ago
- Music GAN - GANSynth preprocessing, ProGAN and DCGAN architecture☆11Jan 26, 2023Updated 3 years ago
- Console Sake Game in Assembly☆24Oct 24, 2022Updated 3 years ago
- An Open Source Kepler GPU Assembler☆20Jan 23, 2017Updated 9 years ago
- A Data Oriented C Compiler in C☆25Mar 28, 2024Updated 2 years ago