Bruce-Lee-LY / cuda_back2back_hgemmView external linksLinks
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
☆13Nov 3, 2023Updated 2 years ago
Alternatives and similar repositories for cuda_back2back_hgemm
Users that are interested in cuda_back2back_hgemm are comparing it to the libraries listed below
Sorting:
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 6 months ago
- Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…☆31Jun 26, 2024Updated last year
- ☆23Updated this week
- Music GAN - GANSynth preprocessing, ProGAN and DCGAN architecture☆11Jan 26, 2023Updated 3 years ago
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆522Sep 8, 2024Updated last year
- Learning through minimalistic server implementations.☆10Oct 20, 2024Updated last year
- 面向多平台编译优化的深度学习中间表示☆10Oct 28, 2024Updated last year
- LinkIt Smart 7688 與 Embedded Linux 講稿☆12Aug 30, 2017Updated 8 years ago
- Hindcast Initial Condition Creation Utility/Processor☆11Updated this week
- A curated list for Efficient Large Language Models☆11Mar 25, 2024Updated last year
- Rust library for music composition with MIDI export☆12Mar 23, 2025Updated 10 months ago
- 🍓 A toy object-oriented programming language written by rust☆17Apr 10, 2024Updated last year
- A SoundFont MIDI synthesizer written in pure Odinlang☆12Aug 13, 2023Updated 2 years ago
- Microbenchmark that unveals the mechanisms behind power readings reported by nvidia-smi on your NVIDIA GPU.☆14Dec 12, 2024Updated last year
- ASKAP Benchmark Packages☆13Nov 3, 2023Updated 2 years ago
- An MLIR-based AI compiler designed for Python frontend to RISC-V DSA☆13Oct 10, 2024Updated last year
- Official PyTorch implementation of the paper: "Deep Audio Waveform Prior" (Interspeech 2022) https://arxiv.org/abs/2207.10441☆11Oct 25, 2022Updated 3 years ago
- 基于区块链技术之可溯源珠宝电商平台☆11Dec 2, 2020Updated 5 years ago
- ☆10Jun 17, 2025Updated 8 months ago
- A unified programming framework for high and portable performance across FPGAs and GPUs☆11Mar 23, 2025Updated 10 months ago
- ☆18Sep 10, 2025Updated 5 months ago
- 华为集合通信性能测试☆15May 27, 2024Updated last year
- Python client for the etcd API v3, supported python >= 3.7, under active maintenance☆12Aug 4, 2025Updated 6 months ago
- linux kernel for gdk8☆10Jan 30, 2022Updated 4 years ago
- ☆12Jan 4, 2024Updated 2 years ago
- 6502 Emulator written in C++☆13Feb 18, 2025Updated 11 months ago
- A "minimal" example of a Vulkan rainbow triangle in Odin with GLFW.☆11Jun 2, 2024Updated last year
- ☆14Mar 20, 2022Updated 3 years ago
- Code for reproducing key results in the paper "Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time" by Kārlis Freiv…☆10Apr 10, 2020Updated 5 years ago
- ☆10Jan 30, 2017Updated 9 years ago
- Doodling with particle systems☆11Feb 8, 2021Updated 5 years ago
- 在 Linux 中寻找到乐趣,嘿嘿^_^☆10May 29, 2021Updated 4 years ago
- Parallel cuckoo hashing on GPUs with CUDA☆12Sep 27, 2019Updated 6 years ago
- ☆10Apr 24, 2023Updated 2 years ago
- Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…☆12Aug 12, 2022Updated 3 years ago
- llama INT4 cuda inference with AWQ☆54Jan 20, 2025Updated last year
- ECM Factorization on CUDA-GPUs☆14Sep 29, 2020Updated 5 years ago
- Userspace eBPF Runtime Benchmarking Test Suite and Results☆16Apr 21, 2024Updated last year
- Inline PTX Assembly in CUDA example☆13May 7, 2022Updated 3 years ago