Fast SGEMM emulation on Tensor Cores
☆17Feb 16, 2025Updated last year
Alternatives and similar repositories for cuMpSGEMM
Users that are interested in cuMpSGEMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- GEMMul8 (GEMMulate): GEMM emulation using INT8/FP8 matrix engines based on the Ozaki Scheme II☆54Updated this week
- Acceleration codes for the Ozaki-scheme on integer matrix multiplication units.☆24Dec 10, 2025Updated 3 months ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆115Dec 2, 2025Updated 3 months ago
- Developing multi platform gesture detector application by applying concepts learnt in Embedded Systems course on peripheral devices.☆21Dec 8, 2023Updated 2 years ago
- Tencent Distribution of TVM☆16Apr 7, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆12Dec 22, 2024Updated last year
- A vector field rendering library☆17Jul 31, 2019Updated 6 years ago
- A C++based implementation of the TeaLeaf heat conduction mini-app. This implementation of TeaLeaf replicates the functionality of the ref…☆25Aug 11, 2024Updated last year
- Flux tutorial slides and materials☆26Updated this week
- CUDA GPU Benchmark☆37Jan 31, 2025Updated last year
- CUDA Finite Difference Library☆16Aug 21, 2020Updated 5 years ago
- Digital paint mixing program based on the Kubelka-Munk equations. Implementation of : T. Lindemeier, J. M. Gülzow, and O. Deussen. 2018…☆15Sep 10, 2020Updated 5 years ago
- A scalable implementation of the multifrontal method for symmetric and Hermitian systems (with intrafrontal pivoting)☆19Jun 27, 2016Updated 9 years ago
- Distributed Communication-Optimal Shuffle and Transpose Algorithm☆14Feb 20, 2026Updated last month
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Stable, numerical Navier-Stokes solver for use in real-time simulation☆16Apr 6, 2021Updated 4 years ago
- General, Hybrid and Optimized Sparse Toolkit (Bitbucket mirror)☆12Apr 8, 2021Updated 4 years ago
- ☆38May 23, 2025Updated 10 months ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆16Aug 31, 2023Updated 2 years ago
- An autonomous car able to ride safely.☆13Jan 31, 2021Updated 5 years ago
- ☆17Nov 3, 2025Updated 4 months ago
- Inference code for LLaMA models☆21Apr 3, 2025Updated 11 months ago
- ☆20Mar 3, 2026Updated 3 weeks ago
- This repo contains the code of the paper "RayJoin: Fast and Precise Spatial Join", ICS'24☆11Mar 19, 2026Updated last week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆11Apr 10, 2019Updated 6 years ago
- 2014: Variational Monte Carlo for the harmonic oscillator, helium, hydrogen and H2 - IPython notebook and FORTRAN90☆13Jun 23, 2016Updated 9 years ago
- Triton implementation of GPT/LLAMA☆21Aug 28, 2024Updated last year
- Distributed-memory, double-precision, polar decomposition (QDWH/ZOLO-PD) of a dense matrix, svd (QDWH/ZOLOPD-SVD) of a dense matrix☆15Jun 3, 2020Updated 5 years ago
- JUBE benchmarking environment configuration files☆10Oct 1, 2015Updated 10 years ago
- Implementation of vDNN++; an improvement over vDNN☆18Dec 7, 2018Updated 7 years ago
- Experimental Linear Algebra Performance Studies☆12Feb 24, 2017Updated 9 years ago
- Instruction latency & throughput profiler for AArch64☆42Aug 20, 2025Updated 7 months ago
- Multidimensional arrays for C++. (Not an official Boost library) \\ This is a mirror of gitlab.com/correaa/boost-multi☆19Updated this week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Implementation of Jos Stam's Stable Fluids☆27Oct 21, 2017Updated 8 years ago
- C++ library for non photorealistic rendering. Includes a paint renderer based on Kubelka Munk equations, stroke based rendering algorithm…☆30Nov 22, 2020Updated 5 years ago
- Stochastic Series Expansion (SSE) for a isotropic S=1/2 antiferromagnetic quantum Heisenberg model in 1D, 2D or 3D lattice . Every lattic…☆15Jan 23, 2021Updated 5 years ago
- PyTorch code for ROLL, a knowledge-based video story question answering model.☆21Sep 29, 2020Updated 5 years ago
- CS61 learning schedules and assessments☆16Dec 6, 2011Updated 14 years ago
- ☆19Dec 27, 2023Updated 2 years ago
- Effective transpose on Hopper GPU☆28Sep 6, 2025Updated 6 months ago