GEMMul8 (GEMMulate): GEMM emulation using INT8/FP8 matrix engines based on the Ozaki Scheme II
☆53Mar 13, 2026Updated last week
Alternatives and similar repositories for GEMMul8
Users that are interested in GEMMul8 are comparing it to the libraries listed below
Sorting:
- Acceleration codes for the Ozaki-scheme on integer matrix multiplication units.☆22Dec 10, 2025Updated 3 months ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆114Dec 2, 2025Updated 3 months ago
- Fast SGEMM emulation on Tensor Cores☆17Feb 16, 2025Updated last year
- Generating contraction orders and perform numerical contractions for arbitrary tensor networks☆18May 20, 2024Updated last year
- FFVC - Frontflow/violet Cartesian☆14Apr 5, 2020Updated 5 years ago
- ☆16Mar 3, 2026Updated 2 weeks ago
- Training v0.7 results☆12Nov 18, 2025Updated 4 months ago
- ☆18Jan 2, 2026Updated 2 months ago
- This is an old archived repository that we keep for our records. Please use recent GENESIS repository and do not use this one.☆11Sep 15, 2022Updated 3 years ago
- An extension library of WMMA API (Tensor Core API)☆111Jul 12, 2024Updated last year
- High Availability Shared Pipeline Engine☆17Sep 15, 2023Updated 2 years ago
- CUDA Finite Difference Library☆16Aug 21, 2020Updated 5 years ago
- ☆16May 17, 2018Updated 7 years ago
- Digital paint mixing program based on the Kubelka-Munk equations. Implementation of : T. Lindemeier, J. M. Gülzow, and O. Deussen. 2018…☆15Sep 10, 2020Updated 5 years ago
- Semi-Lagrangian Library☆16Oct 23, 2023Updated 2 years ago
- MaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.☆14Apr 12, 2022Updated 3 years ago
- Itoyori: A distributed multi-threading runtime system for global-view fork-join task parallelism☆22Feb 9, 2024Updated 2 years ago
- Stable, numerical Navier-Stokes solver for use in real-time simulation☆16Apr 6, 2021Updated 4 years ago
- variPEPS -- Versatile tensor network library for variational ground state simulations in two spatial dimensions☆18Updated this week
- A high-performance implementation of Empirical Dynamic Modeling (EDM)☆19Feb 25, 2026Updated 3 weeks ago
- General, Hybrid and Optimized Sparse Toolkit (Bitbucket mirror)☆12Apr 8, 2021Updated 4 years ago
- ☆38May 23, 2025Updated 9 months ago
- ☆22May 7, 2025Updated 10 months ago
- A flexible, templated GPU library of neighbor search algorithms.☆12Jul 22, 2021Updated 4 years ago
- This repo contains the code of the paper "RayJoin: Fast and Precise Spatial Join", ICS'24☆11Updated this week
- A little library for using SIMD instructions for x86 and ARM, wrapping Agner Fog's vectorclass for x86 and filling some of its functional…☆17Dec 10, 2021Updated 4 years ago
- A library for code transformations with guaranteed legality☆18Updated this week
- EigenKernel - a package of hybrid parallel solvers for eigenvalue problems☆15Jul 11, 2021Updated 4 years ago
- Distributions is a Nim library for distributions and their functions.☆18Jul 16, 2022Updated 3 years ago
- Official repo for BWLer: Barycentric Weight Layer☆29Sep 26, 2025Updated 5 months ago
- Sort 1..25 values with conditional swaps☆17Aug 6, 2024Updated last year
- Fortran language support for Atom-IDE☆22Mar 25, 2019Updated 6 years ago
- This is an example of a boolean expression editor made in Dear ImGui☆15Dec 3, 2022Updated 3 years ago
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆12Nov 8, 2024Updated last year
- A copy of the DirectX Headers from MinGW-64.☆14Sep 7, 2023Updated 2 years ago
- A top-level, user-focused, conglomerate repo for the NWChemEx project.☆18Dec 11, 2025Updated 3 months ago
- Implementation of vDNN++; an improvement over vDNN☆18Dec 7, 2018Updated 7 years ago
- ☆25Jul 25, 2025Updated 7 months ago
- Ensign is a framework to facilitate dynamical low-rank simulation☆16Feb 21, 2026Updated last month