GEMMul8 (GEMMulate): GEMM emulation using INT8/FP8 matrix engines based on the Ozaki Scheme II
☆50Feb 20, 2026Updated last week
Alternatives and similar repositories for GEMMul8
Users that are interested in GEMMul8 are comparing it to the libraries listed below
Sorting:
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆113Dec 2, 2025Updated 2 months ago
- Fast SGEMM emulation on Tensor Cores☆17Feb 16, 2025Updated last year
- ☆16Feb 9, 2026Updated 2 weeks ago
- learn llvm from scratch☆14Apr 29, 2023Updated 2 years ago
- A C++ library for principal component analysis☆12Feb 23, 2020Updated 6 years ago
- OpenMP offload playground☆10Nov 16, 2024Updated last year
- Hack ELF files to ignore GLIBC_2.14 version checks☆12Dec 16, 2015Updated 10 years ago
- ☆10Updated this week
- This is an example of a boolean expression editor made in Dear ImGui☆15Dec 3, 2022Updated 3 years ago
- An extension library of WMMA API (Tensor Core API)☆109Jul 12, 2024Updated last year
- A copy of the DirectX Headers from MinGW-64.☆13Sep 7, 2023Updated 2 years ago
- GEMV implementation with CUTLASS☆19Aug 21, 2025Updated 6 months ago
- Slender-body hydrodynamics☆15Feb 3, 2026Updated 3 weeks ago
- Simple Qt OpenGL SVG rendering benchmark☆15Sep 18, 2011Updated 14 years ago
- Provides a vendored libjxl.☆16Oct 13, 2022Updated 3 years ago
- Fast Lossless Color Image Compression Library☆10Jun 21, 2022Updated 3 years ago
- Matlab MEX gateway generator☆16Updated this week
- PlayStation1 MDEC compression tools☆11Dec 31, 2020Updated 5 years ago
- ☆14Dec 5, 2024Updated last year
- Nearly singular quadrature for line integrals in 2D and 3D☆11Dec 5, 2025Updated 2 months ago
- CPC2018第二届国产CPU并行应用挑战赛决赛☆11Oct 26, 2018Updated 7 years ago
- High order quadratures for triangles, squares, cubes, and tetrahedra☆12Jan 15, 2026Updated last month
- Minimally complete examples of Julia calling, more importantly being called by, Fortran, C, and Python.☆16Jul 27, 2022Updated 3 years ago
- ☆33May 23, 2025Updated 9 months ago
- ☆18Dec 9, 2025Updated 2 months ago
- Plane-Wave density-functional theory (DFT) development for NWChemEx electronic structure software☆14Updated this week
- A flexible, templated GPU library of neighbor search algorithms.☆12Jul 22, 2021Updated 4 years ago
- sb3 parses SB3.☆15Jun 13, 2020Updated 5 years ago
- Experimental implementation of OpenCL over Metal☆12Jul 20, 2022Updated 3 years ago
- FFTE: A Fast Fourier Transform Package (Official tarballs are unpacked into master as commits)☆12Feb 17, 2024Updated 2 years ago
- Rust implementation of k-d tree to efficiently perform color quantization to predefined sets☆13Feb 14, 2018Updated 8 years ago
- This repo contains the code of the paper "RayJoin: Fast and Precise Spatial Join", ICS'24☆11Updated this week
- cool shell scripts and tricks for managing venvs, docker containers, dotfiles, etc.☆14May 20, 2020Updated 5 years ago
- A little library for using SIMD instructions for x86 and ARM, wrapping Agner Fog's vectorclass for x86 and filling some of its functional…☆17Dec 10, 2021Updated 4 years ago
- PKU Mirror Frontend☆11Apr 5, 2025Updated 10 months ago
- Source for our HPG Paper "CPU-Style SIMD Ray Traversal on GPUs"☆15Aug 31, 2018Updated 7 years ago
- Using GNU GCC with MATLAB MEX☆14Nov 20, 2025Updated 3 months ago
- Scan and visualize C/C++ source file dependencies.☆13Mar 29, 2020Updated 5 years ago
- General, Hybrid and Optimized Sparse Toolkit (Bitbucket mirror)☆12Apr 8, 2021Updated 4 years ago