north-numerical-computing/tensor-cores-numerical-behavior

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/north-numerical-computing/tensor-cores-numerical-behavior)

north-numerical-computing / tensor-cores-numerical-behavior

Test suite for probing the numerical behavior of NVIDIA tensor cores

☆42

Alternatives and similar repositories for tensor-cores-numerical-behavior

Users that are interested in tensor-cores-numerical-behavior are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

north-numerical-computing / cpfloat
View on GitHub
Custom-Precision Floating-point numbers.
☆45Jun 24, 2026Updated last month
temporal-hpc / reduction-tensor-cores
View on GitHub
Fast GPU based tensor core reductions
☆12Jan 13, 2023Updated 3 years ago
caps-tum / mt4g
View on GitHub
Memory Topology for GPUs
☆19Updated this week
YusukeNagasaka / Batched-SpMM
View on GitHub
New batched algorithm for sparse matrix-matrix multiplication (SpMM)
☆16May 7, 2019Updated 7 years ago
wmmae / wmma_extension
View on GitHub
An extension library of WMMA API (Tensor Core API)
☆115Jul 12, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
apuaaChen / vectorSparse
View on GitHub
☆32Aug 24, 2022Updated 3 years ago
NVlabs / sassifi
View on GitHub
An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluations
☆21Apr 14, 2020Updated 6 years ago
ParCIS / Magicube
View on GitHub
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆92Nov 23, 2022Updated 3 years ago
QianyanTech / NBAssembler
View on GitHub
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
☆96Feb 23, 2023Updated 3 years ago
sjml / Metal-Life
View on GitHub
Conway's Game of Life implemented in Metal and Swift
☆15Nov 28, 2017Updated 8 years ago
NVIDIA / HMM_sample_code
View on GitHub
CUDA 12.2 HMM demos
☆21Jul 26, 2024Updated last year
ezyang / SMT-LIB-benchmarks-pytorch-shapes
View on GitHub
SMT-LIB benchmarks for shape computations from deep learning models in PyTorch
☆18Dec 21, 2022Updated 3 years ago
shixun404 / Fault-Tolerant-SGEMM-on-NVIDIA-GPUs
View on GitHub
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
☆14Apr 3, 2025Updated last year
inEXASCALE / pychop
View on GitHub
A Python package for simulating low precision arithmetic in scientific computing and machine learning
☆21Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
olcf / NVIDIA-tensor-core-examples
View on GitHub
☆20Nov 7, 2019Updated 6 years ago
peichenxie / FPRev
View on GitHub
☆26May 9, 2025Updated last year
ECP-copa / CoMD
View on GitHub
Classical molecular dynamics proxy application.
☆31Jun 29, 2020Updated 6 years ago
krocki / MLP-C
View on GitHub
Multi-layer perceptron in C
☆16Jan 22, 2021Updated 5 years ago
c3sr / tcu_scope
View on GitHub
☆50Jun 27, 2019Updated 7 years ago
yixiaoer / mistral-v0.2-jax
View on GitHub
JAX implementation of the Mistral 7b v0.2 model
☆35Jul 3, 2024Updated 2 years ago
lightsighter / CudaDMA
View on GitHub
Emulating DMA Engines on GPUs for Performance and Portability
☆43May 17, 2015Updated 11 years ago
thecharlieblake / lovely-llama
View on GitHub
An implementation of the Llama architecture, to instruct and delight
☆21May 31, 2025Updated last year
HabanaAI / Habana_Custom_Kernel
View on GitHub
Provides the examples to write and build Habana custom kernels using the HabanaTools
☆26Apr 15, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
EnigmaHuang / Saad_Book_ForTran
View on GitHub
Some "Formula Translations" for Yousef Saad's book "Iterative Methods for Sparse Linear Systems (2nd Edition)"
☆13Jan 14, 2018Updated 8 years ago
thuml / learn_torch.compile
View on GitHub
torch.compile artifacts for common deep learning models, can be used as a learning resource for torch.compile
☆19Dec 22, 2023Updated 2 years ago
microsoft / ConvStencil
View on GitHub
☆37Apr 10, 2024Updated 2 years ago
owensgroup / merge-spmm
View on GitHub
Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018
☆74Oct 5, 2020Updated 5 years ago
netx-repo / training-bottleneck
View on GitHub
Analyze network performance in distributed training
☆20Oct 20, 2020Updated 5 years ago
ademeure / QuickRunCUDA
View on GitHub
☆20May 30, 2026Updated last month
llvm-gpu-news / llvm-gpu-news.github.io
View on GitHub
☆15Jan 21, 2023Updated 3 years ago
UoB-HPC / minifmm
View on GitHub
☆11Aug 8, 2021Updated 4 years ago
bigcode-project / bigcode-inference-benchmark
View on GitHub
☆19Aug 10, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
nicolaswilde / cuda-sgemm
View on GitHub
☆73Jan 6, 2025Updated last year
hgyhungry / ge-spmm
View on GitHub
☆115Jul 3, 2021Updated 5 years ago
sfilippone / mld2p4-2
View on GitHub
☆14Jul 16, 2020Updated 6 years ago
Luca-Dalmasso / matrixTransposeCUDA
View on GitHub
CUDA C simple application for Nvidia's GPU
☆11Jun 7, 2022Updated 4 years ago
itoyori / itoyori
View on GitHub
Itoyori: A distributed multi-threading runtime system for global-view fork-join task parallelism
☆23Feb 9, 2024Updated 2 years ago
qbunia / rodinia
View on GitHub
rodinia benchmark modified to run with ENZO and pathcu instead of nvcc CUDA compiler
☆12Jan 23, 2024Updated 2 years ago
sunlex0717 / DissectingTensorCores
View on GitHub
☆114Apr 19, 2024Updated 2 years ago