Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
☆183Jan 9, 2025Updated last year
Alternatives and similar repositories for xGeMM
Users that are interested in xGeMM are comparing it to the libraries listed below
Sorting:
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- GPU-acceselerated cryptography libraries for ZKsync☆22Feb 24, 2026Updated last week
- 🎹 Instruct.KR 2025 Summer Meetup: 오픈소스 LLM, vLLM으로 Production까지 🎹☆23Aug 2, 2025Updated 7 months ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- ☆12Nov 23, 2020Updated 5 years ago
- A nim module to handle polynomials☆13Jun 7, 2022Updated 3 years ago
- ☆23Jul 11, 2025Updated 7 months ago
- IO engine for Nim.☆10Jul 8, 2024Updated last year
- rust sdk for zkWasm☆11Feb 11, 2026Updated 3 weeks ago
- Generic implementation of the Number Theoretic Transform in the context of cryptography applications☆14Aug 13, 2025Updated 6 months ago
- ☆12Oct 4, 2023Updated 2 years ago
- Web Assembly low level implementation of pairing friendly curves.☆15Feb 10, 2026Updated 3 weeks ago
- ☆15Feb 24, 2026Updated last week
- Implementation of the Kademlia protocol created to gain understanding of distributed hash tables.☆10Aug 16, 2023Updated 2 years ago
- upcoming concurrent library for Nim☆11Apr 25, 2021Updated 4 years ago
- ☆18Nov 11, 2025Updated 3 months ago
- 🧻 Unroll for-loops at compile-time.☆12Jul 27, 2021Updated 4 years ago
- Foundry project for the RLN☆17Nov 10, 2023Updated 2 years ago
- Educational Version of Lookup Argument☆12Apr 10, 2025Updated 10 months ago
- Starky implementation of Bls12-381☆13May 16, 2024Updated last year
- customizable halo2 circuits batcher☆31Oct 25, 2025Updated 4 months ago
- Cryptography libraries for ZKsync☆42Feb 26, 2026Updated last week
- 4k intro sample code written with Nim programming language.☆14Jun 19, 2023Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Generates zero-knowledge proofs of Ethereum smart contract execution.☆42Feb 23, 2026Updated last week
- Yan (炎) is a high-performance CUDA operator library designed for learning purposes while emphasizing clean code and maximum performance.☆18Jul 21, 2025Updated 7 months ago
- Plonkish Nova implementation along with advanced features☆16Dec 16, 2023Updated 2 years ago
- ZK proofs for Brainfuck execution using powdr☆17Aug 28, 2024Updated last year
- ☆15Aug 28, 2023Updated 2 years ago
- An API compatible port of the Stone prover.☆19Nov 4, 2024Updated last year
- Single page documentation sites☆16May 18, 2021Updated 4 years ago
- ☆39Oct 25, 2025Updated 4 months ago
- Super fast FP32 matrix multiplication on RDNA3☆86Mar 30, 2025Updated 11 months ago
- ☆20Nov 3, 2025Updated 4 months ago
- 👩💻 Circom compiler, snippets, hover and language support for Visual Studio Code☆16Apr 20, 2023Updated 2 years ago
- Batched random number generation☆18Sep 24, 2025Updated 5 months ago
- A set of tooling of halo2 circuits verification in Move environments☆16Updated this week
- ☆17Dec 16, 2021Updated 4 years ago
- Pure Triton kernels for Qwen3.5-27B inference on NVIDIA B200☆66Updated this week