Fast SGEMM emulation on Tensor Cores
☆17Feb 16, 2025Updated last year
Alternatives and similar repositories for cuMpSGEMM
Users that are interested in cuMpSGEMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- GEMMul8 (GEMMulate): GEMM emulation using INT8/FP8 matrix engines based on the Ozaki Scheme II☆60Apr 6, 2026Updated last week
- Acceleration codes for the Ozaki-scheme on integer matrix multiplication units.☆25Dec 10, 2025Updated 4 months ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆118Dec 2, 2025Updated 4 months ago
- Developing multi platform gesture detector application by applying concepts learnt in Embedded Systems course on peripheral devices.☆21Dec 8, 2023Updated 2 years ago
- Tencent Distribution of TVM☆16Apr 7, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆12Dec 22, 2024Updated last year
- A vector field rendering library☆17Jul 31, 2019Updated 6 years ago
- A C++based implementation of the TeaLeaf heat conduction mini-app. This implementation of TeaLeaf replicates the functionality of the ref…☆25Aug 11, 2024Updated last year
- Flux tutorial slides and materials☆26Mar 21, 2026Updated 3 weeks ago
- CUDA GPU Benchmark☆38Jan 31, 2025Updated last year
- High-performance GEMM implementation optimized for NVIDIA H100 GPUs, leveraging Hopper architecture's TMA, WGMMA, and Thread Block Cluste…☆10Dec 4, 2024Updated last year
- CUDA Finite Difference Library☆16Aug 21, 2020Updated 5 years ago
- Digital paint mixing program based on the Kubelka-Munk equations. Implementation of : T. Lindemeier, J. M. Gülzow, and O. Deussen. 2018…☆14Sep 10, 2020Updated 5 years ago
- A scalable implementation of the multifrontal method for symmetric and Hermitian systems (with intrafrontal pivoting)☆19Jun 27, 2016Updated 9 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Stable, numerical Navier-Stokes solver for use in real-time simulation☆16Apr 6, 2021Updated 5 years ago
- Distributed Communication-Optimal Shuffle and Transpose Algorithm☆14Updated this week
- General, Hybrid and Optimized Sparse Toolkit (Bitbucket mirror)☆12Apr 8, 2021Updated 5 years ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆16Aug 31, 2023Updated 2 years ago
- ☆38May 23, 2025Updated 10 months ago
- An autonomous car able to ride safely.☆13Jan 31, 2021Updated 5 years ago
- ☆17Nov 3, 2025Updated 5 months ago
- Inference code for LLaMA models☆21Apr 3, 2025Updated last year
- This repo contains the code of the paper "RayJoin: Fast and Precise Spatial Join", ICS'24☆11Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆11Apr 10, 2019Updated 7 years ago
- Triton implementation of GPT/LLAMA☆21Aug 28, 2024Updated last year
- 2014: Variational Monte Carlo for the harmonic oscillator, helium, hydrogen and H2 - IPython notebook and FORTRAN90☆13Jun 23, 2016Updated 9 years ago
- Distributed-memory, double-precision, polar decomposition (QDWH/ZOLO-PD) of a dense matrix, svd (QDWH/ZOLOPD-SVD) of a dense matrix☆15Jun 3, 2020Updated 5 years ago
- ☆20Mar 3, 2026Updated last month
- JUBE benchmarking environment configuration files☆10Oct 1, 2015Updated 10 years ago
- Experimental Linear Algebra Performance Studies☆12Feb 24, 2017Updated 9 years ago
- Implementation of vDNN++; an improvement over vDNN☆18Dec 7, 2018Updated 7 years ago
- Instruction latency & throughput profiler for AArch64☆43Aug 20, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Multidimensional arrays for C++. (Not an official Boost library) \\ This is a mirror of gitlab.com/correaa/boost-multi☆19Updated this week
- Implementation of Jos Stam's Stable Fluids☆27Oct 21, 2017Updated 8 years ago
- C++ library for non photorealistic rendering. Includes a paint renderer based on Kubelka Munk equations, stroke based rendering algorithm…☆29Nov 22, 2020Updated 5 years ago
- Stochastic Series Expansion (SSE) for a isotropic S=1/2 antiferromagnetic quantum Heisenberg model in 1D, 2D or 3D lattice . Every lattic…☆15Jan 23, 2021Updated 5 years ago
- PyTorch code for ROLL, a knowledge-based video story question answering model.☆21Sep 29, 2020Updated 5 years ago
- CS61 learning schedules and assessments☆16Dec 6, 2011Updated 14 years ago
- ☆19Dec 27, 2023Updated 2 years ago