olcf/NVIDIA-tensor-core-examples

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/olcf/NVIDIA-tensor-core-examples)

olcf / NVIDIA-tensor-core-examples

☆20

Alternatives and similar repositories for NVIDIA-tensor-core-examples

Users that are interested in NVIDIA-tensor-core-examples are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LeiWang1999 / Stream-k.tvm
View on GitHub
☆20Sep 28, 2024Updated last year
LighthouseHPC / lighthouse
View on GitHub
☆11Apr 10, 2019Updated 7 years ago
jundaf2 / CUDA-INT8-GEMM
View on GitHub
CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API
☆37Sep 15, 2023Updated 2 years ago
wzsh / wmma_tensorcore_sample
View on GitHub
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆147Aug 18, 2020Updated 5 years ago
feifeibear / PSTensor
View on GitHub
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.
☆10Feb 10, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ChASE-library / ChASE
View on GitHub
This repository mirrors the principal Gitlab repository of the Chebyshev Accelerated Subspace iteration Eigensolver. If you want to contr…
☆20Jul 8, 2026Updated last week
Alpine-DAV / vtk-h
View on GitHub
☆11Jul 13, 2022Updated 4 years ago
ContinuumIO / ac2019-dl-gpu
View on GitHub
AnacondaCON 2019 GPU Deep Learning Tutorial
☆16Jun 25, 2026Updated 3 weeks ago
HPMLL / SpInfer_EuroSys25
View on GitHub
☆35Apr 2, 2025Updated last year
sandialabs / LAPIS
View on GitHub
An MLIR-based compiler targeting Kokkos and other programming models
☆17Updated this week
debowin / cuda-tiled-matrix-multiplication
View on GitHub
Optimized Parallel Tiled Approach to perform Matrix Multiplication by taking advantage of the lower latency, higher bandwidth shared memo…
☆17Sep 24, 2017Updated 8 years ago
ecrc / kblas-gpu
View on GitHub
Subset of BLAS routines optimized for NVIDIA GPUs
☆80Mar 27, 2023Updated 3 years ago
facebookresearch / FAMBench
View on GitHub
Benchmarks to capture important workloads.
☆32Apr 1, 2026Updated 3 months ago
NVIDIA / NVPLSamples
View on GitHub
NVIDIA Performance Libraries: Sample code
☆23May 28, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wangruinju / Double-Fusion
View on GitHub
A bayesian approach to examining default mode network functional connectivity and cognitive performance in major depressive disorder
☆13Aug 23, 2019Updated 6 years ago
Roxbili / TorchQuanter
View on GitHub
Quantize pytorch model, support post-training quantization and quantization aware training methods
☆15Jun 15, 2023Updated 3 years ago
eth-cscs / COSTA
View on GitHub
Distributed Communication-Optimal Shuffle and Transpose Algorithm
☆14Apr 18, 2026Updated 3 months ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
gchaw / wattless
View on GitHub
GPU-accelerated AES encryption project
☆11Feb 13, 2015Updated 11 years ago
NVlabs / sassifi
View on GitHub
An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluations
☆21Apr 14, 2020Updated 6 years ago
santoshgsk / awesome-ai-up-to-date
View on GitHub
A list of best resources covering broad topics including Python, Data Engineering, Data Analysis, Machine Learning, Deep Learning, RL
☆13Feb 26, 2020Updated 6 years ago
Blaok / soda
View on GitHub
Stencil with Optimized Dataflow Architecture
☆12Feb 27, 2024Updated 2 years ago
north-numerical-computing / tensor-cores-numerical-behavior
View on GitHub
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆42Jul 24, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
lenLRX / AmpereSparseMatmul
View on GitHub
study of Ampere' Sparse Matmul
☆18Jan 10, 2021Updated 5 years ago
YukeWang96 / MGG_OSDI23
View on GitHub
Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…
☆40Mar 17, 2024Updated 2 years ago
pkestene / MS-HPC-AI-GPU
View on GitHub
resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI
☆23Jan 11, 2024Updated 2 years ago
ecrc / polar
View on GitHub
Distributed-memory, double-precision, polar decomposition (QDWH/ZOLO-PD) of a dense matrix, svd (QDWH/ZOLOPD-SVD) of a dense matrix
☆14Jun 3, 2020Updated 6 years ago
realYurk / SuperComputing-HPC-Data-Summary
View on GitHub
收录SC小组在学习高性能计算、分布式架构、数据挖掘与人工智能方向的笔记和材料
☆15Oct 29, 2021Updated 4 years ago
HPAC / ELAPS
View on GitHub
Experimental Linear Algebra Performance Studies
☆12Feb 24, 2017Updated 9 years ago
FZJ-JSC / jube-configs
View on GitHub
JUBE benchmarking environment configuration files
☆10Oct 1, 2015Updated 10 years ago
escalab / TCUDB
View on GitHub
☆14Jun 6, 2022Updated 4 years ago
Xilinx / mlir-xten
View on GitHub
☆17Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Mantevo / miniFE
View on GitHub
MiniFE Finite Element Mini-Application
☆42May 13, 2026Updated 2 months ago
cornell-brg / torng-uecgra-scripts-hpca2021
View on GitHub
☆12Aug 4, 2022Updated 3 years ago
HKFoggyU / RedBird3D
View on GitHub
3D model for HKUST Redbird (Sundial)
☆18Oct 12, 2022Updated 3 years ago
feifeibear / ChituAttention
View on GitHub
Quantized Attention on GPU
☆45Nov 22, 2024Updated last year
AlibabaResearch / flash-llm
View on GitHub
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆246Sep 24, 2023Updated 2 years ago
weifengliu-ssslab / Benchmark_SpTRSM_using_CSC
View on GitHub
Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides (SpTRSM)
☆17Feb 14, 2020Updated 6 years ago
DARClab-UTD / S2CBench
View on GitHub
☆18Mar 7, 2019Updated 7 years ago